Model benchmarks

    How Locai's Jupiter and L1-Large models perform against frontier models, openly reported.

    In short

    Locai's models are independently benchmarked and openly reported. Locai L1-Large, a 235B-parameter Mixture-of-Experts model, ranks #1 on Arena Hard v2, ahead of leading frontier models, demonstrating that a model you own can match or beat ones you rent.

    Arena Hard v2

    A demanding, human-preference-aligned evaluation of hard prompts.

    RankModelType
    1Locai L1-Large (Locai Labs)Owned · post-trained
    GPT-5 (OpenAI)Frontier API
    Claude (Anthropic)Frontier API
    Gemini (Google)Frontier API

    Evaluations we report

    BenchmarkMeasuresResult
    Arena Hard v2Hard, human-preference-aligned promptsLocai L1-Large #1
    IFEvalInstruction-following accuracySee technical report
    Terminal BenchAgentic / terminal-use capabilitySee technical report
    BritXNLIUK-grounded natural-language inferenceSee technical report

    Methodology & sources

    Results are drawn from Locai's published technical reports and independent evaluations. Model weights for the Jupiter family are released openly, so results are reproducible.

    See the model in your environment

    Book a briefing to discuss benchmarks against your own tasks and a deployment that fits your perimeter.