Model benchmarks

How Locai's Jupiter and L1-Large models perform against frontier models, openly reported.

In short

Locai's models are independently benchmarked and openly reported. Locai L1-Large, a 235B-parameter Mixture-of-Experts model, ranks #1 on Arena Hard v2, ahead of leading frontier models, demonstrating that a model you own can match or beat ones you rent.

Arena Hard v2

A demanding, human-preference-aligned evaluation of hard prompts.

Rank	Model	Type
1	Locai L1-Large (Locai Labs)	Owned · post-trained
—	GPT-5 (OpenAI)	Frontier API
—	Claude (Anthropic)	Frontier API
—	Gemini (Google)	Frontier API

Evaluations we report

Benchmark	Measures	Result
Arena Hard v2	Hard, human-preference-aligned prompts	Locai L1-Large #1
IFEval	Instruction-following accuracy	See technical report
Terminal Bench	Agentic / terminal-use capability	See technical report
BritXNLI	UK-grounded natural-language inference	See technical report

Methodology & sources

Results are drawn from Locai's published technical reports and independent evaluations. Model weights for the Jupiter family are released openly, so results are reproducible.

Models on Hugging Face →Technical reports →

See the model in your environment

Book a briefing to discuss benchmarks against your own tasks and a deployment that fits your perimeter.

Book a sovereign AI briefing

Explore enterprise AI