Cloud inference is the easiest version of a Mac AI app to ship. You add a key field, call an API, stream the result, and move on.
Vehla takes the harder path by making Local AI a first-class setting. The app defaults to a Gemma 4 model for local inference, and cloud providers stay available when users want a frontier model or already have their own keys.
The objection list, scored honestly
Every objection we'd written against shipping local-first turned out to be smaller than we thought:
- "Local models are dumber." Sometimes true for complex code and reasoning, much less true for everyday rewriting, summarizing, tone changes, and extraction.
- "The download is too big." Local models are measured in gigabytes, so Vehla makes model choice explicit and shows download progress.
- "Apple Silicon is required." 84% of our trial users are on M-series already, and Local AI is best when the hardware is designed for it.
- "It'll be slower." Local speed depends on the Mac and selected model, but it avoids network latency and keeps sensitive text on the device.
What it actually cost
The real costs weren't engineering. They were the soft ones nobody puts on the roadmap:
The MLX runtime forced us to ship a 49 MB binary instead of a 6 MB one. That spooked one reviewer enough to make a YouTube video about "bloat". It's a fair complaint that we accept.
The main product cost is clarity. Users need to know which model is installed, which model is active, and when Local AI is being used. That is why Vehla shows active AI status in the palette and settings.
Some power users wanted a smaller cloud-only mode for older Macs. We added one. It costs us a settings row.
The moment we knew
An ER nurse emailed us. She uses Vehla to rewrite shift-handoff notes — clinical jargon she can't legally paste into ChatGPT. Local mode meant she could finally use AI at work. The email had a line that read "I'm not technical, I just want it to stay on my computer."
That's the entire pitch. A bunch of people aren't paranoid; they just have specific things they don't want leaving the machine. The default should serve them.
What we'd do differently
We waited too long. We could have shipped local-first in v1.0 by dropping a smaller model behind a slower default. Worrying about quality kept us from giving users the choice. The lesson, again, is the boring one: ship the safer default and let power users opt up, not the other way around.