Comparison
On-prem LLM vs an API
An 8-point comparison of running an LLM inside your perimeter versus calling a model on someone else's servers, and when each one fits.
In short
An API is access to a model running on someone else's servers, billed per token; an on-prem LLM runs inside your own perimeter on hardware you control. With an API your data leaves your environment and you never own the model; on-prem, the data stays in and the model is yours.

When each one fits
- An API fits: early experiments, low-sensitivity workloads, and spiky usage where you don't yet want to provision hardware.
- On-prem is required: when data legally cannot leave your perimeter, when you need air-gapped operation, or when you need to own and audit the model.
- The cost crossover: per-token billing scales with usage forever; at sustained enterprise volume, owning the model becomes substantially cheaper and you keep the asset.
- The one-way door: every prompt sent to an external API is data you can't recall. On-prem avoids that irreversible exposure entirely.
On-prem LLM vs API LLM
| On-prem LLM | API LLM | |
|---|---|---|
| Data stays inside perimeter | Yes | No, sent to the provider |
| Own the weights | Yes | No |
| Air-gapped operation | Possible | Not possible |
| Domain-trained on your data | Yes, via post-training | No, general-purpose |
| Cost model | Fixed, owned asset | Per-token, recurring |
| Behaviour changes under you | Only when you choose | Can change at any time |
| Latency | Local, predictable | Network-dependent |
| ISO 27001-aligned deployment | Available | Depends on provider |
What this looks like with Locai
What sovereign AI actually looks like in production is the part most marketing skips, so here is the short version.
Locai Labs believes organisations should own their intelligence. Renting access to a general-purpose model that lives on someone else's servers is fine for low-stakes work; for the AI that touches your data, your customers and your decisions, the model itself should be yours. That is the bet behind everything we build.
It is also a bet that an expert model beats a generalist on the work that actually matters to your business. A smaller model trained on your data, your language, your workflows and your edge cases routinely outperforms much larger generalists on the tasks you care about, and it does so on infrastructure you control. The goal is not the biggest model; the goal is the right model for your business.
And it is deployed sovereignly: an owned model that runs inside your perimeter, on-prem via Locai One, in your private cloud tenant, in a UK sovereign cloud, or fully air-gapped, depending on your residency and security requirements. Your prompts, your documents and your outputs stay inside your environment, under UK jurisdiction, with a data path designed to fit GDPR and the procurement standards regulated organisations are held to.
Frequently asked questions
Is on-prem AI hard to operate?
It used to be. Locai One packages a sovereign model, application layer, and serving stack into a fixed-cost on-prem appliance, so you get on-prem control without building the MLOps yourself.
What hardware do I need for an on-prem LLM?
It depends on model size, a 35B model can run on a single A100 80GB server, while larger configurations use multi-GPU nodes. Locai advises on hardware or supplies it as part of Locai One.
Can on-prem models still improve over time?
Yes. With continual learning the on-prem model is retrained on your evolving data on a schedule, so it compounds in value rather than freezing at deployment.
Book a sovereign AI briefing
A 30-minute session on owning your model: deployment options, the data path, and a clear cost range for your use case.
