Evals. The thing that decides if your AI ships.
Moonlabs is the operator-led AI Academy in Derby. We run three live companies — Homemove, home.co.uk and homedata.co.uk — and we teach twelve students per cohort to ship a real AI product, sell it to a real customer, and raise on it. Three pillars: Coding, Commercials, Investment. Twelve weeks. £6,000.
Moonlabs is what we are. Two operators — James Freestone and Louis O’Connell-Bristow — who run Homemove, home.co.uk and homedata.co.uk. Every AI surface across these three businesses sits behind eval suites we wrote ourselves — the discipline on this page is the one that keeps real users from churning when we ship.
The Academy is what we do. A twelve-week, in-person, twelve-student cohort in Derby. You build a real AI product. You sign a paid pilot on it. You write a deck and a financial model. You leave with a deployed system, a paying customer reference and a live investor pipeline. Coding, Commercials, Investment — the three pillars taught in equal weight every week.
Why this page exists. Evals are the most under-taught discipline in the entire AI engineering stack. Every senior AI engineer lists them as the single biggest reason their team ships and another team does not. An eval is the contract between a stochastic system and a deterministic product. Without one you cannot tell whether your AI feature got better or worse this week, you cannot swap providers, and you cannot defend a regression to your customer. You leave the Academy fluent in the discipline that separates a working AI system from a polished demo.
Coding · production evals, end to end
Golden sets, LLM-as-judge with calibrated rubrics, structured-output validators, regression test harnesses gated in CI, observability for live evals, drift dashboards, evals as procurement criteria when you choose vendors. Model and provider portability so you can swap Claude for OpenAI for an open-weight model in an afternoon. A deployed AI surface behind your own eval suite by week twelve.
Commercials · selling AI quality as a service
Eval-as-a-service is the “DevOps consultancy” of the AI era and customers are paying real money for it. Pricing an evals retainer, the discovery call, a one-page pilot on one AI feature, the first paying customer. A paid pilot by week six — eval work is one of the easiest AI offers to sell into an SMB right now because the failure mode is so visible.
Investment · raising on AI quality and observability
Braintrust raised $36m, Arize at $400m+, Langfuse, Patronus AI raised at $50m from Lightspeed, Galileo $50m from Battery, Athina, Promptfoo (acquired interest from cloud providers), Vellum — AI evals and observability is one of the loudest funded thesis areas of 2025-26. Cap table, ten-slide deck, financial model. A live investor pipeline by demo day.
Common questions.
I have heard of evals but never written one. Is this realistic?
Yes. Most engineers we meet are in exactly the same position. By the end of week two you will have written your first eval and gated a feature behind it; by week twelve you will have a full suite.
Will I learn LLM-as-judge specifically?
Yes — in detail. Where it works, where it fails, how to calibrate the judge against a golden set, when to lean on structured validators instead. The patterns that hold up under audit.
How does this fit with the rest of the curriculum?
Evals are the spine that runs through every project. RAG, agents, prompts, fine-tuning — all of them ship behind eval suites you write yourself.
Do I need a specific stack or framework?
No. We teach the patterns provider-agnostically; the framework choice is your project’s decision. Promptfoo, Inspect, OpenAI evals, Braintrust, hand-rolled — all valid; we will help you pick.
Is this an LLMOps course?
It overlaps. The Academy is broader (engineering, commercials, investment, deployment); the evals depth is the LLMOps-shaped part of it. If LLMOps is your job title, the Academy is one of the few courses in the UK that takes the topic seriously.
More Academy entry points.
The Academy is one course with many doors. Each of these pages is a different entry point into the same twelve weeks.
Build it. Sell it. Raise on it. In twelve weeks.
Tell us what AI surface you would eval first and what would break before the eval caught it. James and Louis read every application personally and reply inside the week.
© 2026 Moonlabs Incubator. All rights reserved.