How to audit your image pipeline without touching your codebase

May 13, 2026·6 min read·by Thomas Hart

A practical guide to measuring what your image pipeline is actually doing: format coverage, compression ratios, derivative gaps, and delivery delays — without touching production code.

Most image pipeline problems are invisible until they're expensive. A 4MB JPEG lands in your library, nothing breaks, but users on 4G are waiting three seconds for a grid tile that should have loaded in under one. No error, no alert, just a slow page.

An image pipeline audit is a structured review of how your application generates, stores, and delivers image derivatives: thumbnails, WebP conversions, and preview clips, and whether that process is completing reliably and efficiently. Most of the useful signal already exists in your storage, logs, and database.

What you're actually measuring

A useful audit covers four areas:

Format coverage. What fraction of your stored assets are still serving as JPEG or PNG originals to end users?
File size distribution. Are your derivatives actually smaller than your originals? By how much?
Derivative completeness. Which assets are missing thumbnails, WebP conversions, or preview clips?
Processing latency. How long does it take from upload to a usable derivative?

Start with derivative completeness. It's the fastest to query and it shows the biggest gaps.

Are you missing thumbnails for existing assets?

If your asset records track both an original storage key and a thumbnail key, this is a one-query audit:

SELECT
  COUNT(*) FILTER (WHERE thumbnail_key IS NULL)     AS missing_thumbnails,
  COUNT(*) FILTER (WHERE thumbnail_key IS NOT NULL) AS has_thumbnails,
  COUNT(*)                                           AS total
FROM assets
WHERE media_type = 'IMAGE'
  AND status = 'READY';

Run the same query scoped to uploads in the last 30 days and compare. A growing gap over time means your post-upload processor is dropping work: the processing side is losing jobs, and the delivery layer is fine.

If you track upload timestamp and thumbnail generation timestamp separately, you can calculate processing latency directly:

SELECT
  percentile_cont(0.50) WITHIN GROUP (ORDER BY thumb_generated_at - uploaded_at) AS p50,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY thumb_generated_at - uploaded_at) AS p95
FROM assets
WHERE thumb_generated_at IS NOT NULL
  AND uploaded_at > NOW() - INTERVAL '30 days';

In my testing, P95 above 10 seconds on a serverless stack signals a suspension problem worth digging into. The most common cause is the function being suspended mid-task.

What format are you actually delivering to users?

Your CDN or storage provider logs tell you what format users actually received, which is not always what you're generating. Pull a sample of request logs and count by content-type:

# From Cloudflare R2 log export or S3 access logs
grep '"GET"' access.log | awk '{print $12}' | sort | uniq -c | sort -rn

If image/jpeg dominates even though you're generating WebP derivatives, you're probably linking originals in your frontend instead of derivative keys. When I built out the asset delivery layer in Pixel Wand, this was the first thing I found wrong: the presigned URL helper was defaulting to the original key because I hadn't threaded the derivative key through the response shape. That cost me maybe two hours because the output looked completely correct in the upload response; I had both keys, the frontend was reading from the response, it just happened to always pick the wrong one.

The log format itself is its own problem worth calling out. R2 log exports and S3 access logs use different field orderings and the awk '{print $12}' column for content-type is not reliable across log rotation configs or export formats. Spot-check a few lines manually before you trust the counts.

Step 3: Audit your file size distribution

Pull a representative sample of originals versus derivatives:

SELECT
  AVG(file_size)    AS avg_original_bytes,
  MIN(file_size)    AS min_original_bytes,
  MAX(file_size)    AS max_original_bytes,
  COUNT(*)          AS sample_size
FROM assets
WHERE media_type = 'IMAGE'
  AND status = 'READY'
  AND uploaded_at > NOW() - INTERVAL '90 days';

If your DAM or storage layer tracks derivative sizes separately, compare them. In my testing, a compression ratio below 5:1 on photographic content at thumbnail dimensions usually means you're over-indexing on quality or under-constraining width.

In my testing, 640px WebP at quality 70-75 lands between 20 and 40KB for typical photography. The same crop as JPEG at default quality runs 80 to 150KB. If your grid tiles are averaging over 100KB, you're serving the wrong file. Thumbnails at that size should average around 30KB for photos and under 15KB for graphics. Anything over 60KB is worth re-examining your encoding settings.

Some Sharp configurations advertise quality settings that don't map the way you'd expect. Sharp's webp({ quality: 80 }) does not produce the same byte output as cwebp -q 80 on the same input; Sharp applies additional chroma subsampling by default that the CLI doesn't. I've seen people spend real time tuning quality values based on what looked reasonable in the Sharp docs and ending up with files twice as large as expected because they didn't account for the encoder defaults. Test against actual output sizes, not assumed ratios.

Step 4: Look at your processing error rate

Most post-upload pipelines swallow errors at the per-asset level because a failed thumbnail shouldn't break an upload. That's the right call for reliability, but it means failures are invisible unless you look.

Search your server logs for your thumbnail or derivative error prefix over the last 7 days:

grep "\[thumbnail\] background generation failed" /var/log/app/*.log | wc -l

Group by error message to find patterns. A spike of Input image exceeds pixel limit tells you users are uploading very large files your processor can't handle. A pattern of Connection reset tells you your processor is hitting storage timeouts under load.

A plain grep | sort | uniq -c on a log export will show you whether errors are random noise or a systematic gap in specific file types or sizes. Run it before you tune anything -- fixing the wrong category is expensive.

Step 5: Sample your actual output in a browser

Queries give you counts. Looking at the output gives you quality. Pick ten recently uploaded assets from your library, open your browser's network panel, and check:

What format is being served (the Content-Type response header)
The actual transferred size (after compression)
Whether the image looks correct at display size (orientation, color, no artifacts)

This takes five minutes and catches things queries miss: sideways thumbnails from EXIF orientation stripping, green-tinted WebP from a color space conversion bug, or over-compressed thumbnails that look fine at a glance but fall apart at 2x DPR.

Putting the numbers together

After running the queries above, a useful snapshot looks like this (substitute your own values):

Metric	Example value	Status
Total images (READY)	5,000
Missing thumbnails	50	Review if >1%
Thumbnail generation P95	4.2s	Good if <10s
Avg original size	3.1 MB
Avg thumbnail size	28 KB	Good
Compression ratio	113:1	Good
Processing errors (7d)	12	Investigate patterns

I built Pixel Wand's asset processing stats page to show this snapshot without running queries manually: derivative coverage and error counts across the whole library in one place, mostly because I was tired of running these same queries every time something looked off.

Processing reliability beats format choice

Most image optimization advice focuses on format choice: use WebP, use AVIF, shave bytes off the encoding. I finished shipping a QStash fallback for Pixel Wand's thumbnail pipeline recently because the fire-and-forget pattern was silently dropping roughly 20% of thumbnail jobs during peak upload hours. No errors surfaced, no alerts fired; assets ended up in the library without thumbnails. A perfectly encoded WebP derivative that never gets generated because your serverless function got suspended is worse than a JPEG that loads. Get your derivative completeness above 99% first, then care about encoding format. Format optimization on top of a leaky pipeline is wasted effort.

FAQ

Do I need to re-upload my existing assets to fix derivative gaps?

No. A backfill script can iterate over assets where thumbnail_key IS NULL, download the original from storage, generate the derivative, and write the key back to the database. The originals are already there. The only cost is storage bandwidth and compute, which is cheap for a one-time pass.

My thumbnails exist but users still load slow. What's wrong?

Check whether your frontend is linking to the thumbnail key or the original. A common bug: the upload response returns both keys, the client stores both, but the image component always renders the original. Open the network panel and look at which URL is being fetched.

How often should I run this audit?

Monthly is reasonable for a growing library. The derivative completeness query is fast enough to run weekly as a cron job and alert on any gap above your threshold. Processing error rate should be monitored continuously if you have log aggregation set up.

What causes processing latency to spike?

On serverless platforms, the most common cause is the function being suspended mid-task. If your post-upload processing runs as a fire-and-forget call without a mechanism to keep the function alive, it gets frozen the moment the HTTP response is sent. Using Next.js after() or an async queue (QStash, BullMQ) to hold the function open through the full processing chain fixes this.

Run the derivative completeness query against your current library, note the percentage of assets missing thumbnails, and work backward from there.