Speedup indexing interactions with sqlite by crtschin · Pull Request #86 · haskell/HieDb

crtschin · 2025-11-29T23:00:53Z

Noticed that sqlite interactions could be improved somewhat I think. Landed on these set of changes. I included benchmarks.

The first change is to use prepared statements. I added statement preparation on all queries that were repeated. Did require me shuffling code around somewhat. The most obvious way to do this led me to this setup, which unfortunately also means that the database setup is always ran, not just on index and init commands, which it previously was. I could take a better look at this and avoid this behavior on request, if it's better to keep the old behavior for backwards-compatibility purposes.
The second change is to set PRAGMA synchronous = NORMAL. The default here is FULL. Considering the journal was set to already use WAL, the difference is the loss of durability. Practically this means that a committed but not yet fsync'd transaction, may be rolled back on system failure. I think this is fine in hiedb's case, as this would be automatically fixed on the follow-up run.

Benchmarks

These were run on ghc 9.6.7 on hie files generated from hls on commit 88ccebe0649f7c41be97d49a986bbfd4185982f6. Benchmarks were setup with hyperfine with warmpup and 10 runs, dropping page caches in between each run. Runs are in reverse chronological order, (3) is baseline.

Benchmark 1: Set `synchronous = NORMAL`
  Time (mean ± σ):      3.991 s ±  0.285 s    [User: 2.239 s, System: 0.782 s]
  Range (min … max):    3.694 s …  4.504 s    10 runs

Benchmark 2: Use prepared statements when indexing
  Time (mean ± σ):      4.652 s ±  0.336 s    [User: 2.272 s, System: 0.785 s]
  Range (min … max):    4.444 s …  5.465 s    10 runs

Benchmark 3: Use unreserved tag name when creating NameCache to prevent collisions with wired-in names (#85)
  Time (mean ± σ):      5.992 s ±  0.578 s    [User: 3.570 s, System: 0.769 s]
  Range (min … max):    5.718 s …  7.625 s    10 runs

Summary
  Set `synchronous = NORMAL` ran
    1.17 ± 0.12 times faster than Use prepared statements when indexing
    1.50 ± 0.18 times faster than Use unreserved tag name when creating NameCache to prevent collisions with wired-in names (#85)

This should be fine for hiedb's purposes. The loss of durability means that a committed but not yet fsync'd transaction, may be rolled back. In hiedb's case, this would be automatically fixed on the follow-up run.

jhrcek

Nice! I like the performance improvements you were able to squeeze out 👍

But can we do that without changing the public api of the library?

jhrcek · 2025-12-01T17:39:16Z

src/HieDb/Create.hs


 {-| Initialize database schema for given 'HieDb'.
 -}
-initConn :: HieDb -> IO ()


We should probably not change the library's api functions without good reason.
E.g. this functions is exposed in multiple versions of the library (https://hackage-content.haskell.org/package/hiedb-0.7.0.0/docs/HieDb-Create.html#v:initConn) and people using hiedb are probably using it.

I'm not sure there's an easy way to avoid changing the visible API of either initConn or deleteInternalTables. I think the types are wrong relative to setting up the prepared statements.

In this case, HieDb is the handle used to do operations on the sqlite file, so it's the likeliest place to put the prepared statements. But initConn, is the function that sets up the tables, but also takes a HieDb, so it can't contain the prepared statements (without doing something like lazy IO). Sqlite doesn't allow statement preparation on tables that don't exist yet. There's a loop I need to break here.

WDYT? Am I missing overlooking an obvious option here?

Just realized after looking at deleteInternalTables, I could also just keep this function for API purposes but also otherwise not use it. I've pushed a commit that re-adds back both functions.

jhrcek · 2025-12-01T17:40:44Z

src/HieDb/Create.hs

+  execute_ conn "PRAGMA optimize;"
  changes conn
-
-deleteInternalTables :: Connection -> FilePath -> IO ()


Same as above - probably not a good idea to change public api (unless we want to do major version bump, which I don't think is necessary).

https://hackage-content.haskell.org/package/hiedb-0.7.0.0/docs/HieDb-Create.html#v:deleteInternalTables

Approximately the same problem occurs here as with initConn. In this case I could keep this function for API purposes and have it keep calling the non-prepared deletions, but otherwise not call it ourselves in the library? And also perhaps decorate it with a DEPRECATED pragma?

crtschin

Leaving the function signatures aside, is it acceptable that setupHieDb/initConn is always called?

The code in this PR implies that Init is essentially a noop relative to the other commands, the tables will always be instantiated if the db file didn't already exist (looking at runCommand).

crtschin · 2025-12-01T22:12:55Z

src/HieDb/Create.hs


 {-| Initialize database schema for given 'HieDb'.
 -}
-initConn :: HieDb -> IO ()


I'm not sure there's an easy way to avoid changing the visible API of either initConn or deleteInternalTables. I think the types are wrong relative to setting up the prepared statements.

In this case, HieDb is the handle used to do operations on the sqlite file, so it's the likeliest place to put the prepared statements. But initConn, is the function that sets up the tables, but also takes a HieDb, so it can't contain the prepared statements (without doing something like lazy IO). Sqlite doesn't allow statement preparation on tables that don't exist yet. There's a loop I need to break here.

WDYT? Am I missing overlooking an obvious option here?

crtschin · 2025-12-01T22:14:50Z

src/HieDb/Create.hs

+  execute_ conn "PRAGMA optimize;"
  changes conn
-
-deleteInternalTables :: Connection -> FilePath -> IO ()


Approximately the same problem occurs here as with initConn. In this case I could keep this function for API purposes and have it keep calling the non-prepared deletions, but otherwise not call it ourselves in the library? And also perhaps decorate it with a DEPRECATED pragma?

It's already called as part of `withHieDb` that sets up and provides the `HieDb` handle.

jhrcek · 2025-12-02T05:04:35Z

Sorry, I'm just a secondary maintainer and I'm kind of busy at work. I'd need to dive deeper to understand the implications of these changes. I'll reserve some time to look deeper into this PR later this week.

In the meantime, I have couple questions:

why do we need to use direct-sqlite - is the prepared statement api provided by sqlite-simple not sufficient to implement this? See https://hackage.haskell.org/package/sqlite-simple-0.4.19.0/docs/Database-SQLite-Simple.html#g:17
I'm somewhat uncomfortable with

the database setup is always ran, not just on index and init commands

Is there a way we could make the change more isolated to get 80% of the benefit with fewer changes (like only do this prepared statement stuff in indexing, which we know is generally the bottleneck, unlike other places)

The use of ContT seems to introduce an inconsistency with some api methods. It may clean up code locally, but overall it seems to add another thing to the api mix, which makes it feel a bit more chaotic.

crtschin · 2025-12-02T09:38:25Z

why do we need to use direct-sqlite - is the prepared statement api provided by sqlite-simple not sufficient to implement this? See https://hackage.haskell.org/package/sqlite-simple-0.4.19.0/docs/Database-SQLite-Simple.html#g:17

Good question! I did this to get rid of the argument checking that sqlite-simple does that's useful when writing queries, but less so when running queries. I realize I didn't benchmark this change. I'll do so!

the database setup is always ran, not just on index and init commands

Is there a way we could make the change more isolated to get 80% of the benefit with fewer changes (like only do this prepared statement stuff in indexing, which we know is generally the bottleneck, unlike other places)

I'd have to experiment a bit, but I think it'll be hard without changing the API more than I already did. But I can probably try only adding more things.

Sorry, I'm just a secondary maintainer and I'm kind of busy at work. I'd need to dive deeper to understand the implications of these changes. I'll reserve some time to look deeper into this PR later this week.

No worries! No rush needed at all. I'm also not super satisfied with the API changes, so I'm also keen to improve it.

Benchmarking gives a negligible difference between binding via sqlite-simple, which does additional checks on binding parameters, and direct-sqlite that only calls the underlying sqlite3 function. Considering there is no difference stick to using helpers from the same library.

fendor · 2025-12-03T17:55:56Z

Leaving the function signatures aside, is it acceptable that setupHieDb/initConn is always called?

~~Always called for each query individually or just whenever hiedb is used, i.e., when the db connection is opened?~~ (EDIT: it is the latter)
Afaict, this should be completely fine, our most important use case is HLS, which opens the connection once and then keeps it open, and running the code once is no issue for us.
Perhaps ghciwatch folks call hiedb on the cli for each request, then this change might be relevant to them. I doubt that this change has a huge effect in this case either, as it is only a few queries?

I'd have to experiment a bit, but I think it'll be hard without changing the API more than I already did.

Keeping the API is in my opinion less of an issue. Sure, let's try to avoid unnecessary changes, but otherwise, a noticeable performance win is worth a breaking change to me :)

crtschin · 2025-12-04T18:05:15Z

I'd have to experiment a bit, but I think it'll be hard without changing the API more than I already did.

For the curious, I gave this an attempt at crtschin@7cfe675. Though I (subjectively) like that less than this PR. So I'd prefer we stick to the setup here if acceptable.

why do we need to use direct-sqlite - is the prepared statement api provided by sqlite-simple not sufficient to implement this? See https://hackage.haskell.org/package/sqlite-simple-0.4.19.0/docs/Database-SQLite-Simple.html#g:17

Good question! I did this to get rid of the argument checking that sqlite-simple does that's useful when writing queries, but less so when running queries. I realize I didn't benchmark this change. I'll do so!

I did so, it didn't make much difference, so I removed the reference to direct-sqlite. Though do note that hiedb already depends on it indirectly.

Perhaps ghciwatch folks call hiedb on the cli for each request, then this change might be relevant to them. I doubt that this change has a huge effect in this case either, as it is only a few queries?

I think the only noticeable behavioral change is if hiedb is called with a database filepath that doesn't exist, it will create that database file (even if doing only a readonly query), with the appropriate tables. In the previous scenario hiedb wouldn't create the database file, only execute the queries on that non-existing database.

Running ref-graph on a non-existing db file

# Before (file didn't exist prior, also does not exist after call):
> hiedb -D /tmp/non-existing-file ref-graph
hiedb: SQLite3 returned ErrorError while attempting to perform prepare "SELECT  mods.mod,    decls.hieFile,    decls.occ,    decls.sl,    decls.sc,    decls.el,    decls.ec,rmods.mod, ref_decl.hieFile, ref_decl.occ, ref_decl.sl, ref_decl.sc, ref_decl.el, ref_decl.ec FROM decls JOIN refs              ON refs.hieFile  = decls.hieFile JOIN mods          ON mods.hieFile  = decls.hieFile JOIN mods  AS rmods    ON rmods.mod = refs.mod AND rmods.unit = refs.unit AND rmods.is_boot = 0 JOIN decls AS ref_decl ON ref_decl.hieFile = rmods.hieFile AND ref_decl.occ = refs.occ WHERE ((refs.sl > decls.sl) OR (refs.sl = decls.sl AND refs.sc >  decls.sc)) AND ((refs.el < decls.el) OR (refs.el = decls.el AND refs.ec <= decls.ec))": no such table: decls

# After (file didn't exist prior, does exist after call):
> hiedb -D /tmp/non-existing-file ref-graph
<no output>

The total number of queries didn't actually increase, it's actually less. The previous setup used prepared statements as well, but re-prepared them for every .hie file, instead of globally.

Keeping the API is in my opinion less of an issue. Sure, let's try to avoid unnecessary changes, but otherwise, a noticeable performance win is worth a breaking change to me :)

Performance is a feature after all :D

fendor

LGTM, thank you very much for optimising this crucial piece of the IDE infrastructure!
I merely have documentation requests, if you still have the head for it!

Waiting for @jhrcek's review, assuming he finds the time for it :)

src/HieDb/Create.hs

src/HieDb/Types.hs

src/HieDb/Create.hs

wz1000

Nice performance improvement, a few stylistic suggestions but otherwise this seems ready.

wz1000 · 2025-12-05T09:22:23Z

src/HieDb/Types.hs

-newtype HieDb = HieDb { getConn :: Connection }
+runStatementFor :: (ToRow a, FromRow b) => StatementFor a -> a -> IO (Maybe b)
+{-# INLINE runStatementFor #-}
+runStatementFor (StatementFor statement) params = do


This doesn't seem to be used. Perhaps we should delete it. If we are to keep it, ideally the return variable wouldn't be unconstrained and we would have something like StatementFor a b -> a -> IO b, but we don't seem to need that functionality right now.

I do use it in this line.

I agree on including the output variable. In a different branch where I played around, I had that exact setup, but I omitted it for simplicity's sake. Might be good to explore that if the functions in Query.hs get the same benefit from this treatment.

src/HieDb/Types.hs

src/HieDb/Create.hs

- Adds some needed documentation - Cleans up some helper function names - Splits off statements into its own datatype.

jhrcek

Thank you, tested this out with hls master and didn't notice any issues, so +1 to merge from me.

crtschin · 2026-01-08T13:45:44Z

Heya, gentle ping. Do I need to do anything else? I changed the API so I can make a follow-up to HLS perhaps?

fendor · 2026-01-09T08:14:29Z

Thank you for the ping, merged :)

A follow up PR would be much appreciated! @jhrcek Do we have a policy for releasing hiedb? Should we just do a release?

jhrcek · 2026-01-09T16:42:45Z

I'm not aware of any policy. I can work on releasing new version to hackage tomorrow.

crtschin added 2 commits November 29, 2025 23:23

Use prepared statements when indexing

daa49c0

Set synchronous = NORMAL

a101c3b

This should be fine for hiedb's purposes. The loss of durability means that a committed but not yet fsync'd transaction, may be rolled back. In hiedb's case, this would be automatically fixed on the follow-up run.

fendor requested review from fendor, jhrcek and wz1000 December 1, 2025 08:23

jhrcek requested changes Dec 1, 2025

View reviewed changes

crtschin commented Dec 1, 2025

View reviewed changes

crtschin added 2 commits December 1, 2025 23:34

Remove redundant setupHieDb calls

5b55fa3

It's already called as part of `withHieDb` that sets up and provides the `HieDb` handle.

Add backwards compatible API-functions

49941f6

crtschin added 3 commits December 2, 2025 23:02

Use sqlite-simple for binding parameters

e6bea36

Remove ContT from top-level type signature

01241ad

fendor approved these changes Dec 5, 2025

View reviewed changes

src/HieDb/Create.hs Show resolved Hide resolved

src/HieDb/Create.hs Outdated Show resolved Hide resolved

src/HieDb/Types.hs Show resolved Hide resolved

src/HieDb/Types.hs Show resolved Hide resolved

src/HieDb/Types.hs Show resolved Hide resolved

src/HieDb/Create.hs Show resolved Hide resolved

wz1000 requested changes Dec 5, 2025

View reviewed changes

Address review comments

7ed628e

- Adds some needed documentation - Cleans up some helper function names - Splits off statements into its own datatype.

jhrcek approved these changes Dec 9, 2025

View reviewed changes

crtschin requested a review from wz1000 December 10, 2025 09:35

Finish sentence that trailed off in comment

3272e6f

fendor merged commit 1544d7c into haskell:master Jan 9, 2026
0 of 14 checks passed

Conversation

crtschin commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhrcek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crtschin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhrcek commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crtschin commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fendor commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crtschin commented Dec 4, 2025

Uh oh!

fendor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wz1000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jhrcek left a comment

Choose a reason for hiding this comment

Uh oh!

crtschin commented Jan 8, 2026

Uh oh!

Uh oh!

fendor commented Jan 9, 2026

Uh oh!

jhrcek commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

crtschin commented Nov 29, 2025 •

edited

Loading

jhrcek commented Dec 2, 2025 •

edited

Loading

crtschin commented Dec 2, 2025 •

edited

Loading

fendor commented Dec 3, 2025 •

edited

Loading