Technical · October 25, 2024

Scaling to 10M SKUs: A PostgreSQL Stress Test

Database cylinder next to ascending performance bar chart

We recently re-architected the IRONFEED core engine to handle catalogs an order of magnitude larger than anything we'd seen in production. The target was 10 million SKUs per workspace, refreshed at least every hour, with sub-second filter and mapping operations. This is what we learned running PostgreSQL at that shape.

Why PostgreSQL instead of a specialized store

We considered ClickHouse and a couple of columnar options. PostgreSQL won because feed workloads are not pure analytics — they're a mix of row-level upserts, bulk transforms and per-channel exports. A single mature engine that handles all three with strong correctness guarantees beats stitching two stores together, even when raw scan speed is lower.

The bottlenecks that actually mattered

Most "scaling" advice on the internet optimizes things that don't matter at this size. What actually moved our numbers:

  • Partitioning by workspace, not by date. Feed queries are almost always scoped to a single workspace; partitioning along that boundary cut working-set size by 100x.
  • Per-channel materialized projections. Instead of recomputing the Google / Meta / TikTok shape on every export, we materialize each projection and refresh it incrementally on source change.
  • BRIN indexes on append-only history tables. B-tree indexes were destroying write throughput on the change-log table. BRIN brought it back without hurting the time-range queries we actually run.
  • COPY over INSERT for imports. Obvious in hindsight, but the throughput gap at 10M rows is closer to 30x than 3x.

What didn't help

Aggressive connection pooling, exotic query rewrites, and switching to JSONB for the product blob all looked promising in microbenchmarks and did nothing in production. The wins came from data layout, not query tricks.

Results

End-to-end, the new engine refreshes a 10M-SKU workspace in under 12 minutes, with channel exports available within seconds of source change. P99 filter latency in the UI sits below 400ms. Most importantly, costs scale linearly with workspace size instead of with active query volume — which is what makes the pricing model sustainable.

What's next

We're working on a streaming change-data-capture path for partners that need sub-minute freshness on high-velocity catalogs (think flash sales and live inventory). Expect a write-up once it's been in production for long enough to have something honest to say.

Start shipping better feeds today

Join the teams using IRONFEED to power their product catalogs across every marketing channel.

No credit card required · 14-day free trial