How We Shipped a 13 MB Image Converter That Runs Entirely in Your Browser
Updated

How We Shipped a 13 MB Image Converter That Runs Entirely in Your Browser

A technical deep-dive into building SciZone: compiling ImageMagick, libwebp, libavif, libheif and exiv2 to 13 MB of WebAssembly, adaptive PSNR/SSIM quality search, and a memory-aware worker scheduler that keeps the UI at 60fps.

How We Shipped a 13 MB Image Converter That Runs Entirely in Your Browser

Early on, we asked ourselves a simple question: does this converter actually need a server?

The server would receive images, run a C++ library against them, and send results back. The hard part — the actual codec work — happens in a native library that runs just as well in WebAssembly as it does in a server process. So we removed the server. Everything runs in the user’s browser tab. No upload, no queue, no round-trip.

This post is the story of how we got there — what we compiled, the quality problem we had to solve, and the surprising number of ways a 1000-image batch can break.

If you want to see the result before reading about it, open scizone.dev and drop a folder of photos on the page. Come back when you’re curious how it works.

Why we didn’t go server-side

Server-side image conversion is the obvious path. sharp, imagemagick, libvips — mature, well-tested, trivial to deploy. Write a small service, expose a POST endpoint, done.

The problems surface when people actually use it:

Bandwidth is real money. 1 GB in + 500 MB out, multiplied by daily traffic, gets expensive fast. This is why every free converter caps batch sizes — not a technical limitation, a cost containment measure.

Upload time dominates the experience. On a home connection, uploading 100 photos takes longer than converting them. Your users spend most of their time watching an upload progress bar before anything useful happens.

Privacy is a promise you can’t prove. Even a well-intentioned service can’t guarantee that files didn’t end up in a crash log or a backup snapshot. “We don’t store your files” is a policy, not something verifiable from the outside.

You become the bottleneck. Scale is your problem. One slow batch degrades the experience for everyone.

Running client-side fixes all four: zero bandwidth cost, no upload wait, privacy that’s verifiable in DevTools, and the user’s own CPU doing the work.

What’s in the 13 MB binary

Getting a real image converter into the browser meant compiling the native codec stack with Emscripten. Here’s what ended up in the binary:

  • ImageMagick 7.1.2 — decodes basically every image format ever made: JPEG, PNG, TIFF, GIF, BMP, PSD, HEIF, and more
  • libwebp 1.5.0 — the WebP encoder, with SIMD paths enabled (we deliberately stayed on 1.5; 1.6’s wasm SIMD128 encode path produced corrupted YUV under our toolchain)
  • libavif 1.3.0 + libaom 3.13.3 — AVIF encode and decode, with tile-parallel encoding and the AV1 encoder enabled explicitly (libaom ships decoder-only by default)
  • libheif 1.21.2 + libde265 1.0.18 — HEIC/HEIF decoding; libde265 is the same HEVC decoder Apple uses in macOS Preview
  • libjpeg-turbo 3.1.4 — faster JPEG decoding than ImageMagick’s built-in path
  • libpng, giflib, libtiff — format-specific decoders for quality-critical paths
  • exiv2 0.28.8 — reads and writes EXIF, IPTC, and XMP metadata (for AVIF, metadata goes in as BMFF boxes directly through libavif since exiv2 can’t write BMFF)

Plus supporting libraries: libzstd, libdeflate, brotli, libxml2, and a few others.

The total compresses to about 5 MB over the wire, cached aggressively by a Service Worker so it loads once per device. After that, it’s there even offline.

The quality problem with fixed-quality converters

Most converters give you a quality slider. You pick 80, that gets applied to everything.

This is a poor default because image complexity varies enormously. Quality 80 on a flat logo wastes bytes — you could cut the file in half with no visible change. Quality 80 on a high-detail portrait introduces artifacts — you needed 87 to stay above the “looks the same” threshold.

Our approach is to search for the right quality per image automatically. We ship two perceptual targets as presets:

  • Excellent (default) — PSNR ~42 dB on WebP, ~38 dB on AVIF. Visually indistinguishable from the original on natural photographs; tuned so AVIF can actually beat WebP on JPG sources rather than preserving every DCT artifact of the input.
  • Visually Lossless — PSNR ~44.5 dB on WebP, ~42 dB on AVIF. Archival-grade fidelity for users who need it, at the cost of compression ratio.

Both presets also require SSIM ≥ 0.95 so edges, textures, and gradients are preserved intact.

The algorithm:

  1. Find the hardest part of the image. Locate the highest-entropy region — the block with the most complex detail. This is the worst case for compression quality.
  2. Binary search the quality setting. Encode that block at various quality levels, measuring PSNR and SSIM, until we find the lowest setting that passes both thresholds.
  3. Full encode at the found quality. Run a final encode of the whole image with the selected codec (libwebp or libavif), then copy EXIF/XMP/ICC metadata onto the output.

The overhead is about 1.2–1.5× a single encode. The payoff: every output file is at the optimal size for its content — no wasted bytes on simple images, no artifacts on complex ones, and the user doesn’t have to pick a quality slider.

What this actually buys you, in numbers

We swept 24 photographs (portraits, landscapes, flat lays, architecture, night scenes — deliberately diverse) through both presets. The goal was to land both formats at the same perceived quality and compare file sizes. Quality is measured by SSIM here; SSIM is a better proxy for “looks the same” than PSNR for natural photographs.

PresetWebP SSIMAVIF SSIMΔSSIMWebP ratioAVIF ratioAVIF vs WebP
Excellent (PSNR 42)0.97540.97210.0031.68×2.92×−45%
Visually Lossless (PSNR 44.5 / 44)0.98450.98070.0041.24×2.16×−45%

Two things to call out. First, the two formats land within 0.005 SSIM of each other on every preset — the headline size win is honest. Second, AVIF is consistently ~45% smaller than WebP at matched quality, higher than the 20–30% commonly cited for AVIF vs WebP. That isn’t libavif magic; it’s what happens when you pair a per-image adaptive PSNR/SSIM search with tile-parallel encoding, so AVIF can actually hit its quality ceiling in seconds rather than making you choose between speed and file size.

The memory-aware worker scheduler

A single WebAssembly instance can only use one CPU core, and running it on the main thread freezes the UI. The naive fix is Web Workers — one worker per logical CPU core (navigator.hardwareConcurrency), each running its own WASM instance. That works fine when every image is roughly the same size. It falls apart the moment someone drops a mix of 1 MP JPEGs and 100 MP scans into the same batch: spin up eight workers that each want a 2 GB wasm heap, and the browser tab dies.

We replaced the fixed-size pool with a memory-aware scheduler. Per file:

  1. Peek dimensions cheaply via createImageBitmap on the main thread — no WASM decode yet.
  2. Estimate the heap footprint from pixel count: roughly px × 7 + 80 MB for AVIF, px × 5 + 30 MB for WebP. The fixed overhead is the static codec footprint; the per-pixel term is the decoded RGBA plus working buffers.
  3. Gate dispatch on the device memory budget (navigator.deviceMemory × 0.5, floor 500 MB). If the next file doesn’t fit, the scheduler skips ahead to a smaller file behind it rather than head-of-line blocking.
  4. Dynamically split cores across in-flight encodes. A lone encode grabs every core; two split 50/50; four split 25/25/25/25. Inside the wasm, libaom uses those threads for tile-parallel AVIF encoding (tileColsLog2/tileRowsLog2 picked adaptively per thread budget with a 512 px minimum tile floor, so small images don’t get shredded for a compression penalty).

A few other things that bit us along the way:

Cross-origin isolation is non-negotiable. pthread-backed wasm needs SharedArrayBuffer, which requires the page to serve COOP: same-origin + COEP: require-corp. Without those headers, pthread_create silently returns EAGAIN and every encode collapses to a single thread — you get correct output and 1/N the throughput, which is the worst failure mode to catch because everything still “works.”

Memory grows over time. Emscripten’s allocator doesn’t fully release heap between allocations. On long-running workers, this adds up to a slow leak. Solution: recycle each worker after 32 jobs. Cold-start is ~100 ms with the WASM module cached — cheap enough to do freely.

Transferable buffers are essential. Passing an ArrayBuffer with the transfer list moves it to the worker without copying. For a 50 MB TIFF, the difference between copy and transfer is the difference between a smooth UI and a stalled one.

IndexedDB as a safety net. Conversion results go into IndexedDB before the ZIP builder. If the tab crashes mid-batch, everything that finished is recoverable.

OffscreenCanvas for previews. Thumbnail generation happens in the worker, so the main thread never touches raw pixel buffers — it just receives a rendered thumbnail.

What breaks at 1000+ images

Running large batches surfaced problems that don’t appear in demos:

Memory at the extremes. Very large inputs — 200+ MB TIFF scans, gigapixel panoramas — can push a worker’s heap toward the wasm32 4 GB ceiling. The scheduler’s footprint estimation plus per-file memory budget means a 100 MP TIFF encode runs by itself rather than next to seven siblings competing for the same heap.

ZIP streaming. Building a 10 GB ZIP in memory works until it doesn’t. We stream ZIP entries out as each file completes, so the browser’s save dialog opens long before the last image finishes.

Cancellation. If a user cancels a batch, cancellation has to work without corrupting a currently-encoding worker. Our approach: terminate and recreate the worker. Simple, and the restart is fast enough to be unnoticeable.

Quality warnings. Some images can’t hit the PSNR/SSIM thresholds even at maximum quality — typically very high-ISO noise or heavily pre-compressed files. We surface a visible warning rather than silently shipping a file that missed the target.

Building it all

The Emscripten build pipeline isn’t trivial, but it’s more manageable than it sounds.

build_deps.sh compiles each third-party library via Emscripten’s emconfigure/emmake wrappers and installs them into a shared sysroot. Most libraries just work; a few need configure-flag adjustments. build.sh compiles our own C++ against the sysroot, links everything statically, and emits imgproc.js + imgproc.wasm.

For incremental updates, scripts/rebuild_dep.sh rebuilds a single dependency in a few minutes rather than the full ~15-minute clean build. A CI check flags any size regression above 300 KB so we notice when an upstream bump starts pulling in bloat.

Three optimization wins we came back to: re-enabling libwebp’s SIMD paths cut JPEG encode time by 14.5%. Enabling libde265’s SSE4.1 backend cut HEIC decode time by 18%. Switching AVIF encode from libaom’s internal threading to tile-parallel encoding with our pthread pool took AVIF from 62 s to under 10 s on a 12-JPG alpha_test folder — a 6× speedup with no quality change.

What’s coming next

A few things on the roadmap we’re genuinely excited about:

Animated WebP and AVIF from GIF and APNG inputs. Both codecs support animation; the pipeline doesn’t yet.

A native CLI and MCP server sharing the same C++ core — the same conversion quality available from a terminal or an LLM tool chain, not just a browser tab.

RAW support via libraw. The most-requested missing format.

The takeaway

Browsers in 2026 can run the full native image processing stack. A well-tuned WebAssembly binary isn’t meaningfully slower than a native process for most image workloads — the only real cost is the one-time ~5 MB download, which the Service Worker caches permanently after that.

If you’re building anything image-adjacent and your first instinct is “I’ll spin up a processing server,” ask whether that server actually needs to exist. Often it doesn’t.

The result is live at scizone.dev. Drop a folder of photos and watch your browser do the work.