Model benchmarks
How Locai's Jupiter and L1-Large models perform against frontier models, openly reported.
In short
Locai's models are independently benchmarked and openly reported. Locai L1-Large, a 235B-parameter Mixture-of-Experts model, ranks #1 on Arena Hard v2, ahead of leading frontier models, demonstrating that a model you own can match or beat ones you rent.
Arena Hard v2
A demanding, human-preference-aligned evaluation of hard prompts.
| Rank | Model | Type |
|---|---|---|
| 1 | Locai L1-Large (Locai Labs) | Owned · post-trained |
| — | GPT-5 (OpenAI) | Frontier API |
| — | Claude (Anthropic) | Frontier API |
| — | Gemini (Google) | Frontier API |
Evaluations we report
| Benchmark | Measures | Result |
|---|---|---|
| Arena Hard v2 | Hard, human-preference-aligned prompts | Locai L1-Large #1 |
| IFEval | Instruction-following accuracy | See technical report |
| Terminal Bench | Agentic / terminal-use capability | See technical report |
| BritXNLI | UK-grounded natural-language inference | See technical report |
Methodology & sources
Results are drawn from Locai's published technical reports and independent evaluations. Model weights for the Jupiter family are released openly, so results are reproducible.
See the model in your environment
Book a briefing to discuss benchmarks against your own tasks and a deployment that fits your perimeter.