How to hire the AI-native operator

The job market for AI engineering hires in 2026 is broken in specific, fixable ways. Companies are paying London CS-graduate rates for talent that has never shipped a prompt to production. Candidates with three years of frontend experience and a single Coursera certificate are commanding senior salaries on the basis of AI engineer in their job title. The signal is muddy and the noise is loud.

This post is the hiring playbook we run when companies in our network ask us to help them screen candidates. It is the same screen we apply when we are hiring into Homemove, home.co.uk, and homedata.co.uk. It is what we will use this November when we make introductions for our first Academy demo day.

The shift in what good looks like

Three years ago, AI engineer meant a research scientist or a machine learning engineer — someone who could train a model, design a loss function, debug a CUDA kernel. The total addressable hiring pool was small, the candidates were expensive, and the bar was clear.

Today, AI engineer means something different. It means a generalist software engineer who can integrate frontier models into a product, write evals, manage prompts, deploy a RAG pipeline, and make a customer-facing AI feature work reliably. The training-the-model layer has been almost entirely commoditised. The integration layer has expanded enormously.

The hiring market has not kept up. We see two failure modes routinely:

Companies hiring research-flavoured candidates for integration-flavoured roles. The candidate cannot ship a product; they leave or get fired within a year.
Companies hiring frontend-flavoured candidates and giving them the AI engineer title because nobody on the team can tell the difference. The product breaks in the field within six months and the team loses confidence in the candidate.

Both of these are expensive. UK hiring economics: a bad hire at £75k who leaves within twelve months costs roughly £150k all-in (recruiter fees, lost productivity, replacement cost, knock-on team morale). The cost of getting the screen right pays for itself many times over.

The five questions that actually predict performance

We have refined the screening process down to five questions. They are simple but they discriminate. If a candidate can answer four of five well, hire. If they can answer two or fewer, do not.

1. "Walk me through the last prompt you wrote that broke in production. What was the failure mode, how did you find it, and what did you change."

This question is doing several things at once. It checks that the candidate has shipped prompts to production (eliminates the Coursera-only crowd). It checks that they have an observability discipline (otherwise how did they find the failure). It checks that they understand the model's failure modes (otherwise the fix will be theatre). And it checks for narrative clarity — can they explain a debugging story without hand-waving.

Bad answers: "I don't really remember" or "the prompt was working fine, then the model updated." Good answers: a specific story with a specific failure (hallucinated field, drifted output format, prompt injection, latency spike), a specific debugging path (eval suite, log inspection, regression test), and a specific fix (added few-shot, constrained output schema, switched model temperature).

2. "How do you test an AI feature."

If they answer unit tests they have not built one. If they answer we look at the output and it seems fine they have not built one to scale. If they answer with a real eval discipline — gold sets, regression snapshots, LLM-as-judge for fuzzy outputs, golden conversations for chat — they have done the work.

We let candidates ramble on this one. The depth of the answer is the signal.

3. "Tell me about a decision you made between Postgres and a specialist vector DB / between Cursor and an alternative IDE / between Anthropic and another model provider. What did you pick and why."

The decision itself does not matter. The reasoning matters. A candidate who has opinions on the architectural trade-offs and can articulate them precisely has shipped at depth. A candidate who has memorised vendor marketing copy will fold on this question within thirty seconds.

4. "What does your prompt folder look like in the repo you currently work on."

This is the most ruthless one. Most teams have no prompt management discipline. If the candidate's team does, the candidate will describe it in detail and have opinions about why theirs is good. If the candidate's team does not, they will look uncomfortable, mumble about we just have a few in code, and you have learned everything you need to learn.

This question is also a positive selector — a candidate who has built the prompt discipline on their team is a leadership-track hire even at the early-career band.

5. "Show me a screen of code you wrote yesterday. Talk me through it."

The cheapest and most powerful question. Candidates who cannot or will not pull up a screen of code at the interview are a hard no. Candidates who pull it up and walk through it confidently — including what they would change if they had another hour — are demonstrating, in real time, the loop that they will repeat for the next two years of employment.

We have stopped doing take-home tests entirely. They favour candidates with time, not candidates with skill. The five-minute live walkthrough is more predictive and ten times faster.

The questions that look good but do not work

A non-exhaustive list of questions we have tried and abandoned.

"How would you implement an LLM-powered customer support bot from scratch." Too much room for hand-waving. Strong candidates and weak candidates produce broadly similar answers; the discrimination is poor.
"Tell me about your favourite open-source project." Tests narrative skill, not engineering skill. Smooth-talking candidates over-perform.
"What is your salary expectation." Counter-productive. Anchors the candidate too early and loses good people who are uncomfortable negotiating.
"Where do you see yourself in five years." Theatre. Tells you nothing.
Anything that involves a whiteboard algorithm puzzle. The skill we are hiring for is shipping AI features end-to-end, not reversing a binary tree under time pressure.

The general rule: the questions should map directly to the work the candidate will do in their first ninety days. If the question is about something they will never actually do, replace it.

Calibrating salary

We see a consistent UK band for AI engineering hires in 2026:

Strong candidates, less than two years of post-graduate experience: £55k-£70k
Strong candidates, two to four years of relevant experience: £75k-£95k
Strong candidates with shipped product evidence and clear founder posture: £95k-£130k

London tier is roughly 15-25% above these bands; Midlands and northern UK is roughly 10-15% below. Remote-first companies are bidding on something close to London tier even for non-London talent.

The single biggest mistake we see is companies anchoring on frontend engineer salary for AI engineering roles. The roles are scarcer and the talent pool is materially thinner. Pay accordingly or accept the consequence.

What we are running on demo day

On 3 September 2026, twelve Moonlabs Academy graduates will present the companies they have built over the previous twelve weeks. Companies in our hiring network are invited to attend in Derby. The graduates we are most confident in have already had the screen above applied — they pass on four or five of the five questions, before they even get to you.

If you are a UK-based company building AI features and you would like to attend, the invitation is on the for-employers page. No placement fee. We make warm introductions; the rest of the process is yours.

The Moonlabs Academy graduates twelve AI-native operators per cohort. Demo day is 3 September 2026 in Derby. If you are hiring, email james.

How to hire the AI-native operator

The shift in what good looks like

The five questions that actually predict performance

The questions that look good but do not work

Calibrating salary

What we are running on demo day

James Freestone

Keep reading

The curriculum with no textbook: teaching a field that reinvents itself every quarter

The candle and the token: what happens when the price of thinking collapses

The first week in the Incubator, hour by hour

Your next chapter starts here.