Why we made local-first the default — and what it cost us.

Cloud inference is faster to ship, cheaper to support, and unbeatable for the latest frontier models. We made the harder choice and put a local model behind every default action. Here's the math, the missteps, and the moment we knew it was right.

The Vehla team
Ottawa

Cloud inference is the easiest version of a Mac AI app to ship. You add a key field, call an API, stream the result, and move on.

Vehla takes the harder path by making Local AI a first-class setting. The app defaults to a Gemma 4 model for local inference, and cloud providers stay available when users want a frontier model or already have their own keys.

The objection list, scored honestly

Every objection we'd written against shipping local-first turned out to be smaller than we thought:

  • "Local models are dumber." Sometimes true for complex code and reasoning, much less true for everyday rewriting, summarizing, tone changes, and extraction.
  • "The download is too big." Local models are measured in gigabytes, so Vehla makes model choice explicit and shows download progress.
  • "Apple Silicon is required." 84% of our trial users are on M-series already, and Local AI is best when the hardware is designed for it.
  • "It'll be slower." Local speed depends on the Mac and selected model, but it avoids network latency and keeps sensitive text on the device.

What it actually cost

The real costs weren't engineering. They were the soft ones nobody puts on the roadmap:

The MLX runtime forced us to ship a 49 MB binary instead of a 6 MB one. That spooked one reviewer enough to make a YouTube video about "bloat". It's a fair complaint that we accept.

The main product cost is clarity. Users need to know which model is installed, which model is active, and when Local AI is being used. That is why Vehla shows active AI status in the palette and settings.

Some power users wanted a smaller cloud-only mode for older Macs. We added one. It costs us a settings row.

The moment we knew

An ER nurse emailed us. She uses Vehla to rewrite shift-handoff notes — clinical jargon she can't legally paste into ChatGPT. Local mode meant she could finally use AI at work. The email had a line that read "I'm not technical, I just want it to stay on my computer."

That's the entire pitch. A bunch of people aren't paranoid; they just have specific things they don't want leaving the machine. The default should serve them.

What we'd do differently

We waited too long. We could have shipped local-first in v1.0 by dropping a smaller model behind a slower default. Worrying about quality kept us from giving users the choice. The lesson, again, is the boring one: ship the safer default and let power users opt up, not the other way around.

← All field notes Download Vehla →