Skip to content
← All posts

One transaction was too many

Posted on:May 2026

One transaction was too many

I store market data in DuckDB. One file, one writer. When an M1 bar comes in, I upsert it and derive all six higher timeframes: M5, M15, M30, H1, H4, D1, in the same transaction. If derive fails, raw rolls back. Half-written state is impossible. It’s the simplest correct design.

It worked until it didn’t. A 10k-row batch of M1→D1 derivation holds the write lock for seconds. A concurrent /query request queues behind it, the gateway times out, and the health check sees a 502.

What I tried

Split the transaction. The raw upsert and a derivation queue entry now commit together — one fast transaction. A background worker picks up the queue and derives in its own short transaction. Same atomicity guarantee, just narrower: raw committed <-> task durably queued. The derive can happen later.

The cost: a brief window where raw bars exist but derived bars don’t. Pass wait=true and the HTTP handler processes the queue inline, so you get read-after-write. A serial client never notices. A concurrent reader might see raw before derived lands.

Idempotent derivation. ON CONFLICT upserts make re-runs harmless. A WHERE NOT EXISTS ... source = 'raw' guard prevents derived rows from overwriting manually ingested ones. If you put an H1 bar in by hand, derivation won’t clobber it.

Crash safety. The queue table and the raw data commit in the same transaction. A kill mid-derive rolls back the derive but leaves the task pending. The worker re-runs on restart. After 5 failures it parks as error — no hot loop. The raw bars were already committed successfully.

What I’d still fix

Late-arriving data has no recovery path. An M1 bar that lands after its hour was derived gets stored raw, but the H1 stays stale. There’s no re-derive trigger.

The whole system runs on one process. One threading.Lock, one DuckDB file, one writer. The moment a second ingestor appears on another host, that invariant breaks and needs a coordinator. For a single VPS running one trader’s data, that’s fine. It’s also the thing that will eventually force a rewrite.