Key Vocabulary

Vibe coding (Karpathy; see (Karpathy 2026)): Prompt-driven development with no attention to how the code works. “Accept the vibes.” Raises the floor — anyone can build software.
Agentic engineering (Karpathy; cf. Willison’s “vibe engineering” (Karpathy 2026; Willison 2025c)): The disciplined counterpart: specs, tests, review, ownership of outputs. Coordinating fallible agents while preserving quality. Raises the ceiling.
Software 3.0 (Karpathy; see (Karpathy 2026)): Programming through prompts, context, tools, and memory. The context window is the new program; the LLM is the interpreter.

The December 2025 Inflection

“I have never felt more behind as a programmer.” — Karpathy, Sequoia Ascent 2026 (Karpathy 2026)

For most of 2025, agents were useful but required frequent correction
Around December 2025: “the chunks just came out fine… I couldn’t remember the last time I corrected it”
The unit of work shifted: typing lines $\to$ delegating macro actions
- implement this feature, refactor this subsystem, write tests and fix failures

Models crossing the threshold: Claude Opus 4.5, Codex 5, Gemini 3

Tools: Claude Code (Feb 2025), Codex CLI (Apr 2025), Gemini CLI (Jun 2025), OpenCode

DHH’s Reversal

Summer 2025 (Lex Fridman podcast):

AI coding tools made “competence drain out of my fingers.” Programming is like playing guitar — you don’t let someone else play for you.

Spring 2026 (Hansson 2026):

“It’s more like working on a team… I just review the final outcome, offer guidance when asked, and marvel at how this is possible at all.”

What changed: not his philosophy, but the tools.

Tab completion felt like someone stealing the keyboard
Agents feel like wearing a mech suit

What Disciplined Practice Looks Like

Willison’s Framework

“If you’re going to exploit these new tools, you need to be operating at the top of your game.” (Willison 2025c)

AI rewards existing engineering practices:

Automated testing — agents fly with a good test suite; without tests, they claim success unchecked
Planning in advance — iterate on the plan first, then hand it to the agent
Documentation — feed context; the agent can only keep a subset of the codebase in view
Version control — agents are fiercely competent at git; use that
Code review — you are now reviewing constantly
Manual QA — predict and dig into edge cases the agent won’t flag
Knowing what to outsource — and what to handle yourself

“AI tools amplify existing expertise.”

Personal Aside: Reproducibility Is Part of the Equation

The readings cite tests, documentation, and version control as what makes agents fly. Reproducibility is conspicuously absent — yet it may matter as much:

Encourages separating side effects and statefulness — reproducible code is easier for agents to reason about. Functional programming’s edge in producing correct programs is the same advantage. An agent working on a pure, stateless module makes far fewer silent mistakes than one entangled in global state.
Enables agent bootstrapping — a fully reproducible project can tell an agent: clone, run the install script, run the tests, fix failures. E.g. curl -fsSL https://pixi.sh/install.sh | sh && pixi run test. The agent can spin up its own environment in a sandbox or cloud instance with no human scaffolding. Good CI practices are the same idea, already automated.
Improves verifiability — Karpathy’s thesis is that LLMs advance fastest where outputs can be verified. Reproducibility makes outputs verifiable: the same input must produce the same output, which is a directly checkable invariant.
Enables safe sandboxing — a self-contained, reproducible project can be handed to an agent running in a throwaway container or remote VM. This directly mitigates the exfiltration risk from the lethal trifecta (Willison 2025b): private data stays out of scope by construction.
Bisectability — deterministic builds make git bisect reliable. An agent can automatically bisect a regression, confident that differences in output reflect code changes rather than environment drift.
Idempotency — reproducible workflows tend to be idempotent. Agents can safely retry failed steps, re-run the pipeline, or roll back without worrying about accumulated side effects corrupting state.

Designing Agentic Loops

An effective loop needs (Willison 2025a):

A clear goal with success criteria
Tools the agent can use (shell, tests, linters, packages)
A feedback loop (run $\to$ check $\to$ iterate)

Works best for: debugging, performance optimisation, dependency upgrades, refactoring — anywhere with trial-and-error and verifiable outcomes.

Cf. Karpathy’s verifiability thesis (Karpathy 2026): AI advances fastest where outputs can be verified (tests pass/fail, code compiles/crashes, benchmarks improve).

When to Use an Agent: A Personal Heuristic

If you know how to do it: Try it with an agent anyway. You can evaluate the output, catch mistakes, and learn to direct agents effectively.
If you don’t know how to do it: Don’t delegate to an agent (yet). You cannot evaluate the output — and having it done for you suppresses the learning you need.

The correct moment to introduce an agent is after you understand the problem well enough to judge the answer.

The Amplification Claim

The consensus across readings:

“AI tools amplify existing expertise.” — Willison (Willison 2025c)

“Vibe coding raises the floor. Agentic engineering raises the ceiling.” — Karpathy (Karpathy 2026)

Evidence:

DHH: senior engineers at 37signals gain far more from AI tools (Orosz 2026)
Amazon: juniors can no longer ship agent-generated code without review (Orosz 2026)
Orosz survey: Builders benefit for large changes, but report identity loss and more AI slop to review (Orosz and Nilsson 2026)

But is this a comforting just-so story that flatters senior engineers?

If amplification is real, what happens to the junior pipeline?
Who develops expertise when the path to expertise changes?

Identity, Craft, and Authorship

DHH’s guitar analogy (Lex Fridman podcast, summer 2025):

The pleasure of programming is in the playing, not just the output.

DHH reversed himself in six months (Orosz 2026). But the feeling didn’t vanish for everyone.

Orosz survey: some Builders report grief at no longer coding by hand (Orosz and Nilsson 2026).

Karpathy (Karpathy 2026):

“You can outsource your thinking, but you can’t outsource your understanding.”

Open questions:

If you specify, review, and own the output but don’t type it — is it still your code?
Was DHH’s “lost competence” real, or was the framing wrong?
Does reviewing agent output build understanding, or erode it?

The Taste–Talent Gap

Ira Glass’s observation: individuals acquire taste much faster than talent.

Coding agents narrow this gap:

Taste drives the work; skill executes it — and agents now execute
Corollary: invest in taste (fast to acquire, high leverage) over memorising syntax (slow, increasingly low leverage)

The flip side — if you lack taste or domain knowledge:

You cannot judge outputs you cannot recognise as wrong.

A deeper corollary: as AI capability surpasses human ability in a domain, the human stops noticing further advances — because evaluation requires being close enough to the ceiling to see it.

Economic Dimension (for reference — not for discussion today)

Orosz & Nilsson survey (900+ engineers) (Orosz and Nilsson 2026):

Companies spending $100–200/month per engineer on AI tools
~30% of engineers hitting usage limits
Cost trajectory widely considered unsustainable
UK/EU companies significantly more budget-cautious than US

Three archetypes:

Builders: Care about quality and craft. Benefit from AI for large changes. Risk: identity loss, AI slop.
Shippers: Focus on outcomes. Most enthusiastic about AI. Risk: faster tech debt, building the wrong things.
Coasters: Can uplevel faster with AI. Risk: generating slop that frustrates Builders.

Over to You

Willison’s vibe engineering requires rigorous tests and specs to be responsible. What does responsible agentic coding look like for research software, where the test harness often doesn’t exist and “correct” is contested?

Discussion Questions

DHH reversed his position in ~6 months. Has anyone here had a similar shift? What triggered it?
Willison says AI amplifies existing expertise. Does that match your experience — is there a skill AI makes less valuable?
“You can outsource your thinking, but you can’t outsource your understanding.” Where’s the line for research software?
If an RSE uses an agent to write code for a research project, who is responsible for the correctness of that code? Does the answer change if it produces a published result?
Individual RSEs may have strong AI preferences — from agent-first to AI-free — and project PIs have their own requirements driven by personal preference or funding constraints. Should RSE–project matching take these into account, so that an RSE who only works with agents is not assigned to a project that prohibits AI use, and vice versa?

Non-Questions (out of scope today)

~~“Will AI replace RSEs?”~~ — labour-market predictions
~~“Which tool should we use?”~~ — tool recommendations
~~“Does AI actually make us faster?”~~ — empirical productivity measurement
~~“How do we get budget for AI tools?”~~ — procurement

Readings

New, optional: Karpathy (2026)’s From Vibe Coding to Agentic Engineering
Mandatory: Willison (2025c)’s Vibe engineering
Mandatory: Orosz (2026)’s DHH’s new way of writing code, which is a summary of a 2hr podcast.
Recommended: Willison (2025a)’s Designing agentic loops: good companion to (1) above.
Optional: Orosz and Nilsson (2026)’s The impact of AI on software engineers in 2026: key trends
Optional: Hansson (2026)’s Promoting AI agents: DHH’s own (shorter) blog post making a statement similar to (2) above.

Karpathy (2026)

The unit of programming changed from typing lines of code to delegating larger “macro actions”… This is why I think the profession is being refactored. The programmer is increasingly not just a code writer, but an orchestrator of agents.

My core automation framework is:

Traditional software automates what you can specify.

LLMs and reinforcement learning automate what you can verify.

capability spike ~= verifiability x training attention x data coverage x economic value

I distinguish two related but different ideas:

Vibe coding raises the floor. It lets almost anyone create software by describing what they want.

Agentic engineering raises the ceiling. It is the professional discipline of coordinating fallible agents while preserving correctness, security, taste, and maintainability.

The old “10x engineer” idea may become much more extreme. People who master agentic workflows may outperform others by far more than 10x.

This means products need agent-native surfaces… I think about this in terms of sensors and actuators. A sensor turns some state of the world into digital information. An actuator lets an agent change something. The future stack is agents using sensors and actuators on behalf of people and organizations.

The right posture is neither dismissal nor blind trust. It is empirical familiarity: learn where they work, where they fail, what they were trained for, and how to build guardrails around them.

You can outsource your thinking, but you can’t outsource your understanding.

The scarce thing is shifting:

Less scarce: code generation, API recall, boilerplate, first drafts, repetitive setup, simple transformations.

More scarce: understanding, taste, eval design, security, system boundaries, agent orchestration, domain-specific feedback loops, and knowing when the model is off the rails.

My current worldview is not that AI simply makes everyone faster at the old work. It is that the work itself is being reorganized around agents. Software, research, education, infrastructure, and knowledge work are all becoming variations of the same pattern:

define the context

define the tools

define the feedback loop

define the guardrails

let agents work

preserve human understanding

Willison (2025c)

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works.

If you’re going to really exploit the capabilities of these new tools, you need to be operating at the top of your game. You’re not just responsible for writing the code—you’re researching approaches, deciding on high-level architecture, writing specifications, defining success criteria, designing agentic loops, planning QA, managing a growing army of weird digital interns who will absolutely cheat if you give them a chance, and spending so much time on code review.

Almost all of these are characteristics of senior software engineers already!

Orosz (2026)

A big win from using AI agents is tackling stuff that you wouldn’t have before. A senior engineer at 37signals ran a “P1 optimization” project to improve the fastest 1% of requests.

Running several AI agents feels less like “project management” and more like “wearing a mech suit.”

37signals has one designer for every two engineers.

AI agents could turn 37signals’ “designer model” into the industry standard.

Command Line Interfaces (CLI) feel like the ultimate AI interface, which validates the Unix philosophy of the 1970s.

Eight hours of sleep is non-negotiable – even during an AI gold rush!

Willison (2025a)

The thing to look out for here are problems with clear success criteria where finding a good solution is likely to involve (potentially slightly tedious) trial and error.

Willison (2025b)

The lethal trifecta of capabilities is:

Access to your private data—one of the most common purposes of tools in the first place!

Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM

The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)

References

Hansson, David Heinemeier. 2026. “Promoting AI Agents.” Blog. David Heinemeier Hansson, January 7. https://world.hey.com/dhh/promoting-ai-agents-3ee04945.

Karpathy, Andrej. 2026. “From Vibe Coding to Agentic Engineering.” Blog. Bear Blog of Andrej Karpathy, April 30. https://karpathy.bearblog.dev/sequoia-ascent-2026/.

Orosz, Gergely. 2026. “DHH’s New Way of Writing Code.” Newsletter. The Pragmatic Engineer, April 8. https://newsletter.pragmaticengineer.com/p/dhhs-new-way-of-writing-code.

Orosz, Gergely, and Elin Nilsson. 2026. “The Impact of AI on Software Engineers in 2026: Key Trends.” Newsletter. The Pragmatic Engineer, April 14. https://newsletter.pragmaticengineer.com/p/the-impact-of-ai-on-software-engineers-2026.

Willison, Simon. 2025a. “Designing Agentic Loops.” Blog. Simon Willison’s Weblog, September 30. https://simonwillison.net/2025/Sep/30/designing-agentic-loops/.

Willison, Simon. 2025b. “The Lethal Trifecta for AI Agents: Private Data, Untrusted Content, and External Communication.” Blog. Simon Willison’s Weblog, June 16. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/.

Willison, Simon. 2025c. “Vibe Engineering.” Blog. Simon Willison’s Weblog, October 7. https://simonwillison.net/2025/Oct/7/vibe-engineering/.

Journal Club

Setting the Stage

From Autocomplete to Agents