Comments Page - SQLite async connection pool for high-performance

« Back SQLite async connection pool for high-performancegithub.comSubmitted by slaily 4 days ago

d1l 8 hours ago
This is strange on so many levels.
SQLite does not even do network I/O.
How does sharing a connection (and transaction scope) in an asyncio environment even work? Won’t you still need a connection per asyncio context?
Does sqlite_open really take long compared to the inevitable contention for the write lock you’ll see when you have many concurrent contexts?
Does sqlite_open even register in comparison with the overhead of the python interpreter?
What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
- simonw 8 hours ago
  If you're talking to a 100KB SQLite database file this kind of thing is likely unnecessary, just opening and closing a connection for each query is probably fine.
  If you're querying a multi-GB SQLite database there are things like per-connection caches that may benefit from a connection pool.
  > What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
  Basically yes - aiosqlite works by opening each connection in a dedicated thread and then sending async queries to it and waiting for a response that gets sent to a Future.
  https://github.com/omnilib/aiosqlite/blob/895fd9183b43cecce8...
  d1l 7 hours ago
  That's even crazier - so you're using asyncio because you have a ton of slow network-bound stuff - but for your database access you are running every sqlite connection in it's own thread and just managing those threads via the asyncio event loop?
  reactordev 3 hours ago
  Thread pooling for databases, whether network based, or disk based, is common. A lot of times it will be baked into your client, so the fact that you think it’s crazy means you’ve only dealt with clients that did this for you.
  For really large data sets, you can query and wait a few minutes before getting a result. Do you really want to await that?
  quietbritishjim 7 hours ago
  What is crazy about that?
  lttlrck 7 hours ago
  Of course I don't know what the parent is thinking, but my thought is: why can't it be entirely event loop driven? What are the threads adding here?
  (I don't know anything about that project and this isn't meant as a criticism of its design or a challenge - cos I'd probably lose :-) )
  eurleif 2 hours ago
  SQLite doesn't have a separate server process; it does all of the work for queries in your process. So it's intrinsically CPU-heavy, and it needs threads to avoid blocking the event loop.
  One way to look at is that with a client-server database and an async client library, you have a thread pool in the database server process to do the heavy lifting, and async clients talk to it via TCP. With SQLite, you have that "server" thread pool in the same process instead, and async "clients" talk to it via in-process communication.
  mayli 6 hours ago
  Cause the sqlite-lib that python ships isn't async, and sqlite itself usually doesn't give an async API.
  maxbond 6 hours ago
  Python's asyncio is single threaded. If you didn't send them into a different thread, the entire event loop would block, and it would degenerate to a fully synchronous single threaded program with additional overhead.
  paulddraper 4 hours ago
  This is a common paradigm for blocking APIs (e.g. the sqlite driver)
  crazygringo 7 hours ago
  > If you're querying a multi-GB SQLite database
  In which case SQLite is probably the wrong tool for the job, and you should be using Postgres or MySQL that is actually designed from the ground up for lots of concurrent connections.
  SQLite is amazing. I love SQLite. But I love it for single-user single-machine scenarios. Not multi-user. Not over a network.
  Kranar 3 hours ago
  SQLite is a great database for organizing data in desktop applications, including both productivity software and even video games. It's certainly not at all unreasonable for those use cases to have files that are in the low GB and I would much rather use SQLite to process that data instead of bundling MySQL or Postgres into my application.
  simonw 7 hours ago
  Multi-GB is tiny these days.
  I didn't say anything about concurrent access. SQLite with WAL mode is fine for that these days for dozens of concurrent readers/writers (OK only one writer gets to write at a time, but if your writes queue for 1-2ms who cares?) - if you're dealing with hundreds or thousands over a network then yeah, use a server-based database engine.
  da_chicken an hour ago
  Multi GB is tiny, but that doesn't make SQLite magically better at large queries of multi GB databases. That's why DuckDB has been getting more popular.
  benjiro 26 minutes ago
  Sqlite != DuckDB... two totally different DB types. One is a row based, the other is a column based database. Both run different workloads and both can handle extreme heavy workloads.
  brulard 7 hours ago
  I always had troubles having multiple processes get write access to the sqlite file. For example if I have node.js backend work with that file, and I try to access the file with different tool (adminer for example) it fails (file in use or something like that). Should it work? I don't know if I'm doing something wrong, but this is my experience with multiple projects.
  dathinab 4 hours ago
  There are multiple aspects to it:
  - sqlite is a bit like a RWLocked database either any number or readers xor exactly one writer and no readers
  - but with WAL mode enabled readers and writers (mostly) don't block each other, i.e. you can have any number of readers and up to one writer (so normally you want WAL mode if there is any concurrent access)
  - if a transaction (including implicit by a single command without "begin", or e.g. upgrading from a read to a write transaction) is taking too long due to a different processes write transaction blocking it SQLITE_BUSY might be returned.
  - in addition file locks might be used by SQL bindings or similar to prevent multi application access, normally you wouldn't expect that but given that sqlite had a OPEN_EXCLUSIVE option in the past (which should be ignored by half way modern impl. of it) I wouldn't be surprised to find that.
  - your file system might also prevent concurrent access to sqlite db files, this is a super obscure niche case but I have seen it once (in a shared server, network filesystem(??) context, probably because sqlite really doesn't like network filesystems often having unreliable implementations for some of the primitives sqlite needs for proper synchronization)
  as other comments pointed out enabling WAL mode will (probably) fix your issues
  Groxx 7 hours ago
  They can't write concurrently, but generally speaking yes, they can: https://sqlite.org/faq.html#q5
  Your throughput will be much worse than a single process, but it's possible, and sometimes convenient. Maybe something in your stack is trying to hold open a writable connection in both processes?
  simonw 4 hours ago
  That is because the default SQLite mode is journal, but for concurrent reads and writes you need to switch it to WAL.
  cyanydeez 6 hours ago
  PRAGMA journal_mode = WAL;
  naasking 7 hours ago
  > In which case SQLite is probably the wrong tool for the job
  Why? If all it's missing is an async connection pool to make it a good tool for more jobs, what's the problem with just creating one?
  nomel an hour ago
  It's a bit re-inventing the wheel, since solving all the problems that come with network access is precisely why those databases exist, and what they've already done.
  asyncpg is a nice python library for postgres.
  I think postgres releasing a nice linkable, "serverless" library would be pretty amazing, to make the need for abusing sqlite like this (I do it too) go away.
slaily 4 days ago
If you’re building Python async apps (FastAPI, background jobs, etc.) with SQLite, you’ll eventually hit two issues
- Opening/closing connections is fast, but not free—overhead adds up under load
- SQLite writes are globally locked
aiosqlitepool is a tiny library that adds connection pooling for any asyncio SQLite driver (like aiosqlite):
- It avoids repeated database connection setup (syscalls, memory allocation) and teardown (syscalls, deallocation) by reusing long-lived connections
- Long-lived connections keep SQLite's in-memory page cache "hot." This serves frequently requested data directly from memory, speeding up repetitive queries and reducing I/O operations
- Allows your application to process significantly more database queries per second under heavy load
Enjoy!
- slashdev 8 hours ago
  How does this help with the second issue, the write locks?
  ncruces 8 hours ago
  No idea if it applies, but one way would be to direct all writes (including any transaction that may eventually write) to a single connection.
  Then writers queue up, while readers are unimpeded.
  dathinab 4 hours ago
  if you enable WAL mode with sqlite then readers are not blocked by writer so only writers queue up without needing any special case handling to archive it
  (in general you _really_ should use WAL mode if using sqlite concurrently, you also should read the documentation about WAL mode tho)
- mostlysimilar 8 hours ago
  Around what amount of load would you say the overhead of opening/closing becomes a problem?
- manmal 6 hours ago
  Doesn’t SQLite have its own in-memory cache? Is this about having more control re cache size?
  dathinab 4 hours ago
  yes, per "open connection", hence why not closing+reopening connections all the time helps the cache ;)
bawolff 2 hours ago
> The primary challenge with SQLite in a concurrent environment (like an asyncio web application) is not connection time, but write contention. SQLite uses a database-level lock for writes. When multiple asynchronous tasks try to write to the database simultaneously through their own separate connections, they will collide. This contention leads to a cascade of SQLITE_BUSY or SQLITE_LOCKED errors.
I really don't get it. How would this help?
The benchmarks dont mention which journal mode sqlite is configured as, which is very suspicious as that makes a huge difference under concurrent load.
mayli 6 hours ago
FYI, I've once had few long-lived connection with wal, and wal file just goes exploded. Turns out sqlite won't truncate the wal if there are open connections.
- infamia 4 hours ago
  Using WAL2 should make that problem better. It has two WAL files it alternates between when making writes, so the system has an opportunity to check point the WAL file not in use.
  https://sqlite.org/src/doc/wal2/doc/wal2.md
bob1029 6 hours ago
I've been thinking about trying pre-serialization of SQLite commands to enable single-writer against a singleton SQLiteConnection using something like Channel<T> or other high performance MPSC abstraction. Most SQLite providers have an internal mutex that handles serialization, but if we can avoid all contention on this mutex things might go faster. Opening and closing SQLite connections is expensive. If we can re-use the same instance things go a lot faster.