Journal Club

Coding With Agents: What’s Different, What’s at Stake

Kolen Cheung

May 6th, 2026

Setting the Stage

From Autocomplete to Agents

Autocomplete (2021–)
AI suggests the next line as you type. Accept or reject. GitHub Copilot, Cursor tab completion.
Coding agents (2025–)
AI runs tools in a loop toward a goal — writes code, runs tests, reads errors, iterates. Claude Code, Codex CLI, Gemini CLI, OpenCode.

“An AI agent is an LLM wrecking its environment in a loop.” — Solomon Hykes (via Willison 2025a)

Key Vocabulary

Vibe coding (Karpathy; see (Karpathy 2026))
Prompt-driven development with no attention to how the code works. “Accept the vibes.” Raises the floor — anyone can build software.
Agentic engineering (Karpathy; cf. Willison’s “vibe engineering” (Karpathy 2026; Willison 2025c))
The disciplined counterpart: specs, tests, review, ownership of outputs. Coordinating fallible agents while preserving quality. Raises the ceiling.
Software 3.0 (Karpathy; see (Karpathy 2026))
Programming through prompts, context, tools, and memory. The context window is the new program; the LLM is the interpreter.

What Changed

The December 2025 Inflection

“I have never felt more behind as a programmer.” — Karpathy, Sequoia Ascent 2026 (Karpathy 2026)

Models crossing the threshold: Claude Opus 4.5, Codex 5, Gemini 3

Tools: Claude Code (Feb 2025), Codex CLI (Apr 2025), Gemini CLI (Jun 2025), OpenCode

DHH’s Reversal

Summer 2025 (Lex Fridman podcast):

AI coding tools made “competence drain out of my fingers.” Programming is like playing guitar — you don’t let someone else play for you.

Spring 2026 (Hansson 2026):

“It’s more like working on a team… I just review the final outcome, offer guidance when asked, and marvel at how this is possible at all.”

What changed: not his philosophy, but the tools.

What Disciplined Practice Looks Like

Willison’s Framework

“If you’re going to exploit these new tools, you need to be operating at the top of your game.” (Willison 2025c)

AI rewards existing engineering practices:

“AI tools amplify existing expertise.”

Personal Aside: Reproducibility Is Part of the Equation

The readings cite tests, documentation, and version control as what makes agents fly. Reproducibility is conspicuously absent — yet it may matter as much:

  1. Encourages separating side effects and statefulness — reproducible code is easier for agents to reason about. Functional programming’s edge in producing correct programs is the same advantage. An agent working on a pure, stateless module makes far fewer silent mistakes than one entangled in global state.

  2. Enables agent bootstrapping — a fully reproducible project can tell an agent: clone, run the install script, run the tests, fix failures. E.g. curl -fsSL https://pixi.sh/install.sh | sh && pixi run test. The agent can spin up its own environment in a sandbox or cloud instance with no human scaffolding. Good CI practices are the same idea, already automated.

  3. Improves verifiability — Karpathy’s thesis is that LLMs advance fastest where outputs can be verified. Reproducibility makes outputs verifiable: the same input must produce the same output, which is a directly checkable invariant.

  4. Enables safe sandboxing — a self-contained, reproducible project can be handed to an agent running in a throwaway container or remote VM. This directly mitigates the exfiltration risk from the lethal trifecta (Willison 2025b): private data stays out of scope by construction.

  5. Bisectability — deterministic builds make git bisect reliable. An agent can automatically bisect a regression, confident that differences in output reflect code changes rather than environment drift.

  6. Idempotency — reproducible workflows tend to be idempotent. Agents can safely retry failed steps, re-run the pipeline, or roll back without worrying about accumulated side effects corrupting state.

Designing Agentic Loops

An effective loop needs (Willison 2025a):

  1. A clear goal with success criteria
  2. Tools the agent can use (shell, tests, linters, packages)
  3. A feedback loop (run \(\to\) check \(\to\) iterate)

Works best for: debugging, performance optimisation, dependency upgrades, refactoring — anywhere with trial-and-error and verifiable outcomes.

Cf. Karpathy’s verifiability thesis (Karpathy 2026): AI advances fastest where outputs can be verified (tests pass/fail, code compiles/crashes, benchmarks improve).

When to Use an Agent: A Personal Heuristic

If you know how to do it
Try it with an agent anyway. You can evaluate the output, catch mistakes, and learn to direct agents effectively.
If you don’t know how to do it
Don’t delegate to an agent (yet). You cannot evaluate the output — and having it done for you suppresses the learning you need.

The correct moment to introduce an agent is after you understand the problem well enough to judge the answer.

What’s at Stake

The Amplification Claim

The consensus across readings:

“AI tools amplify existing expertise.” — Willison (Willison 2025c)

“Vibe coding raises the floor. Agentic engineering raises the ceiling.” — Karpathy (Karpathy 2026)

Evidence:

But is this a comforting just-so story that flatters senior engineers?

Identity, Craft, and Authorship

DHH’s guitar analogy (Lex Fridman podcast, summer 2025):

The pleasure of programming is in the playing, not just the output.

DHH reversed himself in six months (Orosz 2026). But the feeling didn’t vanish for everyone.

Orosz survey: some Builders report grief at no longer coding by hand (Orosz and Nilsson 2026).

Karpathy (Karpathy 2026):

“You can outsource your thinking, but you can’t outsource your understanding.”

Open questions:

The Taste–Talent Gap

Ira Glass’s observation: individuals acquire taste much faster than talent.

Coding agents narrow this gap:

The flip side — if you lack taste or domain knowledge:

You cannot judge outputs you cannot recognise as wrong.

A deeper corollary: as AI capability surpasses human ability in a domain, the human stops noticing further advances — because evaluation requires being close enough to the ceiling to see it.

Economic Dimension (for reference — not for discussion today)

Orosz & Nilsson survey (900+ engineers) (Orosz and Nilsson 2026):

Three archetypes:

Builders
Care about quality and craft. Benefit from AI for large changes. Risk: identity loss, AI slop.
Shippers
Focus on outcomes. Most enthusiastic about AI. Risk: faster tech debt, building the wrong things.
Coasters
Can uplevel faster with AI. Risk: generating slop that frustrates Builders.

Discussion

Over to You

Willison’s vibe engineering requires rigorous tests and specs to be responsible. What does responsible agentic coding look like for research software, where the test harness often doesn’t exist and “correct” is contested?

Discussion Questions

  1. DHH reversed his position in ~6 months. Has anyone here had a similar shift? What triggered it?

  2. Willison says AI amplifies existing expertise. Does that match your experience — is there a skill AI makes less valuable?

  3. “You can outsource your thinking, but you can’t outsource your understanding.” Where’s the line for research software?

  4. If an RSE uses an agent to write code for a research project, who is responsible for the correctness of that code? Does the answer change if it produces a published result?

  5. Individual RSEs may have strong AI preferences — from agent-first to AI-free — and project PIs have their own requirements driven by personal preference or funding constraints. Should RSE–project matching take these into account, so that an RSE who only works with agents is not assigned to a project that prohibits AI use, and vice versa?

Non-Questions (out of scope today)

Readings

  1. New, optional: Karpathy (2026)’s From Vibe Coding to Agentic Engineering
  2. Mandatory: Willison (2025c)’s Vibe engineering
  3. Mandatory: Orosz (2026)’s DHH’s new way of writing code, which is a summary of a 2hr podcast.
  4. Recommended: Willison (2025a)’s Designing agentic loops: good companion to (1) above.
  5. Optional: Orosz and Nilsson (2026)’s The impact of AI on software engineers in 2026: key trends
  6. Optional: Hansson (2026)’s Promoting AI agents: DHH’s own (shorter) blog post making a statement similar to (2) above.

Karpathy (2026)

The unit of programming changed from typing lines of code to delegating larger “macro actions”… This is why I think the profession is being refactored. The programmer is increasingly not just a code writer, but an orchestrator of agents.

My core automation framework is:

capability spike ~= verifiability x training attention x data coverage x economic value

I distinguish two related but different ideas:

The old “10x engineer” idea may become much more extreme. People who master agentic workflows may outperform others by far more than 10x.

This means products need agent-native surfaces… I think about this in terms of sensors and actuators. A sensor turns some state of the world into digital information. An actuator lets an agent change something. The future stack is agents using sensors and actuators on behalf of people and organizations.

The right posture is neither dismissal nor blind trust. It is empirical familiarity: learn where they work, where they fail, what they were trained for, and how to build guardrails around them.

You can outsource your thinking, but you can’t outsource your understanding.

The scarce thing is shifting:

My current worldview is not that AI simply makes everyone faster at the old work. It is that the work itself is being reorganized around agents. Software, research, education, infrastructure, and knowledge work are all becoming variations of the same pattern:

Willison (2025c)

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works.

If you’re going to really exploit the capabilities of these new tools, you need to be operating at the top of your game. You’re not just responsible for writing the code—you’re researching approaches, deciding on high-level architecture, writing specifications, defining success criteria, designing agentic loops, planning QA, managing a growing army of weird digital interns who will absolutely cheat if you give them a chance, and spending so much time on code review.

Almost all of these are characteristics of senior software engineers already!

Orosz (2026)

A big win from using AI agents is tackling stuff that you wouldn’t have before. A senior engineer at 37signals ran a “P1 optimization” project to improve the fastest 1% of requests.

Running several AI agents feels less like “project management” and more like “wearing a mech suit.”

37signals has one designer for every two engineers.

AI agents could turn 37signals’ “designer model” into the industry standard.

Command Line Interfaces (CLI) feel like the ultimate AI interface, which validates the Unix philosophy of the 1970s.

Eight hours of sleep is non-negotiable – even during an AI gold rush!

Willison (2025a)

The thing to look out for here are problems with clear success criteria where finding a good solution is likely to involve (potentially slightly tedious) trial and error.

Willison (2025b)

The lethal trifecta of capabilities is:

References

Hansson, David Heinemeier. 2026. “Promoting AI Agents.” Blog. David Heinemeier Hansson, January 7. https://world.hey.com/dhh/promoting-ai-agents-3ee04945.
Karpathy, Andrej. 2026. “From Vibe Coding to Agentic Engineering.” Blog. Bear Blog of Andrej Karpathy, April 30. https://karpathy.bearblog.dev/sequoia-ascent-2026/.
Orosz, Gergely. 2026. “DHH’s New Way of Writing Code.” Newsletter. The Pragmatic Engineer, April 8. https://newsletter.pragmaticengineer.com/p/dhhs-new-way-of-writing-code.
Orosz, Gergely, and Elin Nilsson. 2026. “The Impact of AI on Software Engineers in 2026: Key Trends.” Newsletter. The Pragmatic Engineer, April 14. https://newsletter.pragmaticengineer.com/p/the-impact-of-ai-on-software-engineers-2026.
Willison, Simon. 2025a. “Designing Agentic Loops.” Blog. Simon Willison’s Weblog, September 30. https://simonwillison.net/2025/Sep/30/designing-agentic-loops/.
Willison, Simon. 2025b. “The Lethal Trifecta for AI Agents: Private Data, Untrusted Content, and External Communication.” Blog. Simon Willison’s Weblog, June 16. https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/.
Willison, Simon. 2025c. “Vibe Engineering.” Blog. Simon Willison’s Weblog, October 7. https://simonwillison.net/2025/Oct/7/vibe-engineering/.