Apply to Moonlabs

Cursor, Claude Code, and the ten-times operator

The “10x engineer” of the 2010s is now the “10x operator” of the 2020s — and the tools that produce them are not secret. We use them every day. Here is what they are, how they fit together, and which ones are actually load-bearing.

James Freestone Co-founder, Moonlabs · 18 February 2026 · 7 min read

The “10x engineer” of the 2010s was mostly a myth used to justify hiring decisions that did not survive contact with reality. The 10x operator of the 2020s is real, measurable, and being produced by a specific stack of tools that almost anyone reading this could install in the next hour.

This essay is what we actually use, ranked by how load-bearing it is to the way we work, with honest notes on what each tool is for and what it is not for. No affiliate links. Almost none of these companies know we exist.

The frame: what counts as “load-bearing”

A tool is load-bearing for us if removing it would meaningfully slow down our weekly output. Not “make us slightly less comfortable.” Genuinely slow down output. This is a strict bar and it disqualifies a lot of things that get written about as if they were essential.

The tools that pass the bar fall into four buckets:

  1. The IDE (where the code is written).
  2. The agent layer (where work happens unsupervised).
  3. The reasoning layer (where decisions get made).
  4. The plumbing (everything else that has to not break).

We will go through each.

The IDE: Cursor, with a clear-eyed view

We use Cursor. So does almost every operator we respect. It is the highest-leverage single tool in our stack and the one we would replace last.

The thing Cursor gets right that the alternatives have not yet matched is the flow of multi-file edits with full project context. The model is not just completing the line under the cursor. It is reasoning across the repo. This is a qualitatively different experience than autocomplete and it is the reason productivity numbers in our team roughly doubled in the six months after we standardised on it.

The thing Cursor gets wrong, and we want to be honest about this because the love letters in the press do not, is that it still occasionally produces confident-looking code that does not work. The frequency has fallen by about two-thirds since we started using it in early 2024, but it has not gone to zero. The operator skill that matters here is the ability to read a diff fast and catch the half-correct ones. That skill is teachable but it is not free. A new student on the Academy takes about three weeks to develop the reflex.

Verdict: load-bearing. Replacement candidate: none currently competitive at the kind of multi-file reasoning we lean on.

The agent layer: Claude Code, and the shift it caused

The single biggest shift in how we work in the last year is moving real work off the IDE keyboard and into Claude Code running in the terminal.

The shape of the day used to be: open Cursor, type, read, type, ship. The shape of the day now is: write a clear specification of what needs to happen, hand it to Claude Code, go and do something else, come back twenty minutes later and review the work. Sometimes it is right and gets merged. Sometimes it is wrong and gets handed back with a clarification. Sometimes it is right but ugly and gets rewritten by hand for taste.

The reason this matters is throughput. A single operator running Cursor full-time can ship at perhaps three times the rate of a 2022 engineer. The same operator running Claude Code in parallel on three different tasks can ship at closer to eight or nine times. The ceiling on the workflow is no longer typing speed. It is the operator’s ability to specify clearly and review thoroughly.

A note on the failure mode. The failure mode is not “the agent does something wrong.” That is recoverable. The failure mode is “the agent does something plausibly correct that the operator approves without reading carefully.” This is the new technical-debt accelerator. We have explicit code review rituals now that did not exist in 2023, specifically because the agent layer makes it possible to ship at a rate where the human pipeline cannot keep up. The fix is process, not a different tool.

Verdict: load-bearing. The change in how we work since adopting it is bigger than any other change in the last five years.

The reasoning layer: Claude (and Opus specifically)

Distinct from the agent layer is the reasoning layer. The place we go to think. Architecture decisions, contract reviews, deck rewrites, pitch coaching, financial model logic checks. We use Claude — primarily Opus — for this work. The reason is specific: it is the model whose written output most often holds up under our editorial standards on the first pass.

We are aware this is a moving target. GPT, Gemini, others are all credible at this work and the leaderboard shifts every six months. The honest answer is to keep two or three models in rotation and pick per task. We are not religious about it. We do find ourselves reaching for Opus first on long-form judgment work, and reaching for the lighter, faster models on quick lookup and synthesis work where latency matters more than depth.

The skill that matters here is prompt taste. The student who can write a 300-word brief that elicits a usable first draft is operating at a fundamentally different level from the one who types “help me with my pitch deck.” Prompt taste is largely a function of having a clear model of what good output looks like, which is itself a function of having seen enough good output to know one when you see it. It cannot be taught in a vacuum. It can only be taught in the context of a specific output the student already cares about.

Verdict: load-bearing. The lever for thinking, not for typing.

The plumbing: the unglamorous tier

Several things sit in the “plumbing” tier. They are critical, but the choice of vendor matters less than the operator’s discipline in using them. We list these briefly because writing a thousand words about which CI provider we use would be tedious for everyone.

  • Version control: Git on GitHub, no special trick. The trick is the rituals: pull request templates, draft PR culture, fast review SLAs.
  • Deployment: We mostly run on a small number of well-understood providers. Fly, Vercel, Render, occasionally DigitalOcean for things that want to live close to a Postgres. Choice of provider matters less than picking one and learning its failure modes.
  • Database: Postgres, managed. We do not run our own. Life is too short.
  • Observability: Sentry for errors, simple structured logging into a hosted store. The temptation to over-instrument is real and should be resisted in the first ninety days of a company. Catch errors. Worry about everything else later.
  • CI: GitHub Actions, kept simple. The single biggest mistake we see is teams building elaborate CI pipelines for products that have no users. The pipeline should be exactly as complex as the product warrants and not a step more.

Verdict: load-bearing in aggregate, individually replaceable. The discipline is the moat, not the brand.

What the stack does not include

There are several categories of tool that get a lot of press in the AI tooling community and that we deliberately do not lean on. We mention them here because the absences are also signal.

  • “AI productivity” chrome apps. The constellation of menubar AI assistants, browser side panels, copilots-for-everything. Almost all of them produce a small marginal lift at the cost of a much larger attention tax. We have uninstalled more than we have installed.
  • Vector databases as a default. Specialised vector stores are excellent for specific RAG workloads. They are wildly overprescribed for products that would be perfectly happy with Postgres and pgvector. We use Postgres until proven otherwise.
  • Heavy agentic frameworks. The space of LangChain-likes has matured but is still over-engineered for most real workloads. We tend to write small bespoke loops rather than adopt the whole framework. The cost of carrying the framework dependency exceeds the cost of writing the loop in almost every case we have measured.
  • AI-first project management tools. We use Linear and Notion like everyone else. The “AI-first” PM tools we have trialed have universally been a worse experience than Linear with a sensible workflow on top. Maybe this changes; today it has not.

None of this is gospel. It is what works for two operators running a portfolio of AI-flavoured businesses in 2026. Your context may differ. The reason we publish the list is not to tell you what to install. It is to give you a calibrated reference point so you can argue with us about it.

The bigger point

The instinct to read a list like this and immediately go install all of it is exactly the wrong instinct. The tools are not the leverage. The discipline of using them well is the leverage. We have watched students install the entire stack in week one of the Academy and produce no better output than they did before. We have watched other students stay on a fairly basic setup for the first month and out-ship the heavily-tooled ones by a factor of three, because they spent their week running the play instead of reorganising the toolbox.

A useful frame: every tool you adopt costs you about thirty hours of fluency before it returns positive value. Most operators try to spend two hundred hours of fluency budget in their first month. Then they wonder why their output went down for a quarter.

Pick three. Get fluent. Add the fourth when one of the three is genuinely the bottleneck. Not before.

That is the actual stack.


The Moonlabs Academy runs twelve-week cohorts in Derby. Coding, commercials, and investment, taught by James Freestone and Louis O’Connell-Bristow. Cursor and Claude Code are part of the standard kit.

About the author

James Freestone

Co-founder, Moonlabs. Operator behind home.co.uk, Homemove and homedata.co.uk. AI-native since the week ChatGPT shipped.

Work with us

Keep reading

All essays

Your next chapter starts here.

Tell us about the company you want to build. If we’re a fit, we’ll get back within a week.