Karpathy’s CLAUDE.md Works for 1 Engineer. Here’s What Engineering Teams Actually Need.

8 Hits

by Akhilesh T. June 28, 2026 20 min read

Andrej Karpathy’s CLAUDE.md on GitHub sits at roughly 184k stars and 97 open pull requests as of June 28, 2026. The file was not actually written by Karpathy. On January 26, 2026 he posted his frustrations with AI coding agents on X. The next day developer Forrest Chang encoded those frustrations into a CLAUDE.md with 65 lines and pushed it to GitHub at multica-ai/andrej-karpathy-skills. The file is a behavioral standard for Claude Code, the agent from Anthropic that runs in the terminal and reads any CLAUDE.md at a project’s root. The community recognized itself in the failure modes immediately, and the file picked up significant adoption in the following weeks.

Karpathy is a founding member of OpenAI, led AI at Tesla, and coined “vibe coding” in early 2025. His frustrations carried weight in engineering circles, and Chang’s CLAUDE.md showed up at exactly the moment when people had been waiting to encode the problem.

Many developers copied the file into their repo and moved on. That is a personal productivity move. It is not a team standard, and the difference matters more than the star count suggests.

The CLAUDE.md addresses how 1 engineer manages 1 agent in 1 repo. Engineering organizations are a different system. The file does not cover the parts that matter most at that scale.

What Karpathy Actually Said, and What the File Solves

In 2 months Karpathy’s workflow inverted. In November 2025 he was doing 80 percent of his coding manually with autocomplete assistance and 20 percent through agents. By late January 2026 those numbers had flipped. He was reviewing and touching up 20 percent, and the agent was producing 80 percent. 8 weeks. In a later post he named the feeling directly.

It feels like I’m cheating. Which is a very weird feeling to have. It takes a while to unpack. It’s because some code that used to be a point of pride and high IQ and knowledge is suddenly free and instant and it’s very disorienting.
— Andrej Karpathy (@karpathy) January 26, 2026

The frustrations he named in the same thread were specific. Coding agents make wrong assumptions and run with them without checking. They overcomplicate code, bloating abstractions and shipping 1,000-line implementations when 100 would do. They sometimes change or remove code they do not sufficiently understand, even when it is unrelated to the task they were asked to do.

Karpathy's CLAUDE.md workflow inversion, manual coding to agent coding over 8 weeks

Fig 1 – Karpathy’s Workflow Inversion

Chang’s file translates each of those frustrations into a named principle the agent reads on every session. Think Before Coding tells the agent to state assumptions explicitly and to stop when confused, not guess. Simplicity First tells the agent to write the minimum code that solves the stated problem and to add no unrequested abstractions. Surgical Changes tells the agent to touch no code unrelated to the request, even adjacent code that looks improvable. Goal-Driven Execution tells the agent to convert vague tasks into verifiable success criteria before starting.

The file closes with an honest qualifier. The guidelines bias toward caution over speed. For trivial tasks, the rigor adds friction without producing value. The file’s own author wrote this in.

The file is excellent at what it set out to do. Where it falls short is somewhere it never claimed to operate.

Where Karpathy’s CLAUDE.md Stops Working at the Team Level

Karpathy named 3 failure modes in individual agent behavior. Engineering organizations surface 4 more that the file does not cover. The scenarios below walk through what those look like in practice, with 1 closing observation about why the public repo’s governance does not transfer.

Four organizational failure modes Karpathy's CLAUDE.md does not cover at team scale

Fig 2 – The 4 Organizational Failure Modes

Scenario 1: The Drift Problem

3 engineers, same codebase. Each copied the CLAUDE.md 6 weeks ago. Each has since modified it for their own preferences. Engineer A relaxed Surgical Changes because she was doing a large migration and the strict version was slowing her down. Engineer B tightened Goal-Driven Execution because his team had been burned by an agent shipping a feature with no tests. Engineer C never modified the file but does not know the others did.

The files now say different things. The agent behaves differently for each engineer. Nobody knows. No mechanism exists to detect the drift.

The consequence shows up at code review time, when reviewers notice that Engineer A’s recent PRs touch more files than her tickets suggest they should, but cannot trace why.

Scenario 2: The Shared Service Problem

Engineer A’s CLAUDE.md tells the agent it is fine to clean up orphaned imports as part of a task. Convenient. Engineer A’s task touches a utility in services/shared/auth.py. Engineer B’s service depends on that utility, and Engineer B’s agent, operating under a different CLAUDE.md, has built around the existing import structure. 1 engineer’s “clean up your own mess” becomes another engineer’s broken integration test on Monday morning.

The agents did exactly what their files told them to do. The files told them different things. The shared code paid the price.

Scenario 3: The Onboarding Problem

A new engineer joins on a Tuesday. Gets repo access on Wednesday. Copies the CLAUDE.md on Thursday. Nobody explains what the team’s specific additions to the file mean, why the file diverges from the public Karpathy version, or what to do when the agent does something the file does not cover. By Friday the new engineer’s agent has done something the file does not cover, and the new engineer ships a workaround the team will discover 3 sprints later.

The standard exists. The transmission of the standard does not.

Scenario 4: The Enforcement Ceiling

PR #54 in the public repo proposes advisory hooks for all 4 principles. The PR’s own design philosophy section describes the hooks as advisory, exits warnings on stderr without blocking, and fails open when malformed input arrives. PR #169 patches the first principle in the repo’s skill file for headless mode, because “stop and ask when confused” deadlocks the moment the agent runs in CI/CD or in an overnight loop with no human to answer. Someone hit the wall. Plain instructions in a Markdown file are a suggestion. A suggestion is not a standard. The repo’s most active contributors are building around the limit. The principles as written fail at the exact moment they matter most for engineering organizations. The agent is running unattended at scale.

Scenario 5: The Governance Signals the File Does Not Show

The public repo has a narrow contribution surface. The issues tab is disabled, with 0 filed. Discussions are not enabled. The only used channel is pull requests, and 97 of 138 sit open. The core artifact, the 4 principles, has not changed since January 27 despite 5 months of community proposals to extend it.

What keeps it stable is not visible from the outside. Whether it is active maintainer decisions, contributors withdrawing their own proposals, or inattention, the evidence does not say. What the evidence does say is that the governance surface is constrained by design, and the file’s stability is a product of that constraint.

When a team forks this file internally, they inherit none of those constraints. There is no shared norm directing engineers toward pull requests as the only channel for changing the standard. There is no narrow artifact with an implicit boundary held in place by external structure. The file expands in whatever direction the next engineer decides to take it, and no external signal stops that from happening.

The file was designed for 1 engineer managing 1 agent in 1 repo. It is excellent at that. Engineering organizations are a different problem.

What the Organization Actually Needs Beyond Karpathy’s CLAUDE.md

Each of the failure modes points at a gap the public file does not cover. Closing the gap is what turns a personal discipline file into an institutional standard.

What teams see in CLAUDE.md and the four gaps they must close

Fig 3 – What Teams See in CLAUDE.md, and What They Have to Fix

Gap 1: Onboarding

The CLAUDE.md needs a human explanation, not a curl command.

What did the team decide and why. What does Surgical Changes mean specifically in this codebase. Who answers questions when the agent does something the file does not cover. Onboarding the agent’s behavioral configuration is as important as onboarding access permissions, and most teams treat it as an afterthought. The new engineer who copies a file they do not understand will produce diffs the team does not expect.

Gap 2: Versioning

Model behavior changes. Team standards evolve. The CLAUDE.md needs an owner, a review cadence, and a propagation mechanism across repos.

A file without a version history is a file that drifts. Treat agent instructions as infrastructure that lives next to the code and version them the same way. When Claude Code releases a behavior change, the team needs to know which rules were written to compensate for the old behavior and which still apply. Without versioning, that audit is impossible.

Gap 3: Enforcement

Instructions are followed most of the time. Most of the time is not good enough for standards that matter.

Hooks, CI gates, and PR review checklists turn suggestions into checkpoints. The community is already pushing toward this. PR #169 patches the first principle for headless mode. PR #54 proposes enforcement hooks for all 4. Engineering organizations need to get there deliberately, before a production incident forces the conversation. Enforcement is also about ownership. Someone with the time and authority to review every proposed change to the standard, and the standing to say no when it matters. Without that role, the file expands until it stops meaning anything.

Gap 4: Conflict Resolution

Repo level files will conflict with shared service standards. The resolution mechanism needs to exist before the conflict happens.

Define which file wins, who decides when files disagree, and how the team escalates when the agent’s behavior creates ambiguity in production. The worst time to resolve a precedence question is during a post incident review. The right time is the day the standard is first written, encoded into the file itself so the agent sees the precedence on every session.

Before designing any of these, run an honest audit on your current setup.

8 questions for your team’s current CLAUDE.md.

Is there 1 canonical CLAUDE.md, or has it been copied and modified across engineers?
Does the file have a named owner?
When was it last updated?
Is it in version control with PR history?
Do new engineers receive a walkthrough before using it?
Are there enforcement hooks, or just text instructions?
What is the escalation path when the agent does something the file does not cover?
Is there a documented override mechanism at the repo level?

Treat the list as a diagnostic for what your team has not yet decided. Each item the team cannot answer is a place the agent’s behavior was shaped by accident.

What a Properly Designed CLAUDE.md Engineering Team Standard Looks Like

The structure has 3 tiers. Each tier solves a different problem. The agent reads all of them.

Claude Code’s memory documentation describes the loading model. Managed policy files load first, at system paths controlled by IT. User files at ~/.claude/CLAUDE.md load second. Project files at ./CLAUDE.md or ./.claude/CLAUDE.md load third. Gitignored local files at ./CLAUDE.local.md load last. Subdirectory CLAUDE.md files load on demand when Claude reads files in those directories. All discovered files concatenate into the context window, and when 2 rules conflict the agent may pick either arbitrarily.

The docs also describe managed deployment for the org file. MDM, Group Policy, Ansible, or the claudeMd key in managed settings. Teams still need to put the deployment in place. The org file only constrains the agent when it is present in the engineer’s environment at run time.

The 3 tiers below are conceptual names organized by precedence, not by load order. The org tier holds non negotiables. Security. Review gates. Escalation. Its operational home is the managed policy path or the claudeMd setting. The repo tier holds stack constraints and lives at ./CLAUDE.md or ./.claude/CLAUDE.md. The engineer tier holds personal workflow preferences only. Its operational home is ~/.claude/CLAUDE.md, CLAUDE.local.md, or an imported personal file. The repo tier states explicitly that org rules win. The engineer tier carries an explicit “what this file cannot do” section.

Three tier CLAUDE.md precedence model with org, repo and engineer tiers

Fig 4 – The Three-Tier Precedence Model

What this looks like in practice. The 3 files below are illustrative, not prescribed. They use the conceptual names org-CLAUDE.md, repo-CLAUDE.md, and engineer-CLAUDE.md for clarity. At deployment, place their contents at the actual load paths above. Adapt the specific rules to your codebase and team. The structure they encode matters more than the rules themselves. Precedence. Ownership boundaries. What each tier cannot do.

# org-CLAUDE.md

Behavioral guidelines that apply to every repo in this organization. Loaded before any repo-level CLAUDE.md. Repo and engineer files extend these rules. They do not override them.

**Example file** showing the kinds of rules that belong at the org tier. Adapt them to your organization's security posture and review norms. The structure matters more than the specifics.

**Tradeoff:** These guidelines treat shared-code safety as more valuable than personal velocity. For solo experiments in a sandbox, use judgment.

## 1. Never Touch Secrets

**Stop the moment you see a credential, API key, token, or PII.**

- If a file contains a secret, do not read past it. Tell the engineer.
- If a task seems to require generating a secret, ask. Do not invent one.
- If you find a secret accidentally committed, do not delete it in the same PR as your task. Surface it separately.

## 2. Stay Inside the Task

**Change only what the request requires. Note unrelated issues. Do not fix them silently.**

- If you spot a bug adjacent to the task, list it in the PR description. Do not patch it.
- If a test seems wrong, raise it. Do not rewrite the test to make your change pass.
- If you want to add a dependency, name it in the PR description and explain why an existing one will not do.

## 3. Flag Sensitive Surfaces

**Auth, payment, and PII code requires a human reviewer. Do not merge without one.**

- If the task touches authentication, payment processing, or personal data handling, mark the PR as requiring human review before merge.
- If you are not sure whether a file qualifies, treat it as if it does. Ask.

## 4. Disclose Your Own Involvement

**If you wrote or materially shaped the main implementation, label the PR `agent-assisted`.**

The label is not a judgment. It signals to reviewers that the diff should be read for assumptions the agent may have made silently, not just for correctness.

## 5. Document Assumptions When You Cannot Ask

**In headless or non-interactive mode, do not stall. Pick the most general interpretation and document it.**

- Write the assumption into the PR description as a one-line note.
- Use the format: `Assumed: <what you assumed>. Reason: <why this interpretation>.`
- The next human to read the PR can correct you cheaply if the assumption was wrong.

## 6. When Org and Repo Files Conflict

**Org rules win. Raise the conflict.**

If a repo-level rule appears to contradict a rule above, follow this file. Note the conflict in the PR description so the team can resolve it in the next standards review.

# repo-CLAUDE.md

Behavioral guidelines specific to this repo. Loaded after org-CLAUDE.md. Org rules apply here too and cannot be overridden.

**Example file** from a fictional Python services repo. Adapt the stack names, paths, and shared boundaries to your codebase. The structure of the rules matters more than the specifics.

**Tradeoff:** These guidelines reflect what has broken in this codebase before. They are stricter than the org defaults in places where past incidents earned the strictness.

## 1. Use the Stack This Repo Already Uses

**Match the existing toolchain. Do not introduce alternatives.**

- Tests use pytest. If you see unittest or nose in a file, flag it. Do not write new tests in those frameworks.
- Tests follow the `given_when_then` naming convention.
- HTTP calls go through `internal/http`. Direct use of `requests` or `httpx` is not allowed.
- Database access goes through `data/repositories/`. No raw SQL in business logic.
- Configuration reads from `config/settings.py`. Do not read environment variables anywhere else.

## 2. Surgical Changes in This Codebase

**Change the function with the bug and its direct callers. Stop there.**

- Do not refactor adjacent code, even if it looks improvable.
- Prefer extending an existing module over creating a new one. A new module requires a one-line justification in the PR description.
- If the task touches `data/repositories/`, run the integration test suite locally before opening the PR. The suite takes 8 minutes. Do not skip it.

## 3. Respect Shared Service Boundaries

**`services/billing/` is consumed by 3 other repos. Coordinate before changing its public interface.**

- Any change to the public API of `services/billing/` requires the consuming repo owners to be tagged on the PR before merge.
- Database migrations require platform team review. Open a draft PR with the migration first. Wait for review before writing code that depends on it.

## 4. Verify Before Implementing

**Prove the expected behavior before changing code.**

- For bug fixes, write a failing test that reproduces the bug first.
- For new behavior, write a test that defines the success criterion first.
- If no meaningful test can be written, ask what success looks like before changing code.

**For refactors, prove behavior stayed the same before changing structure.**

Run existing tests around the affected behavior first. If coverage is weak, add characterization tests before refactoring. Then refactor. After the change, run the same tests again.

Do not change behavior during a refactor unless the task explicitly asks for it.

If the refactor touches a shared surface, including an API contract, database access, authentication, payment, or PII handling, open a draft PR and request human review before merge.

## 5. When Engineer and Repo Files Conflict

**Repo rules win for anything affecting shared code. Engineer files cover personal workflow only.**

If an engineer-level rule appears to contradict a rule above, follow this file. Raise the conflict so the engineer's file can be corrected.

# engineer-CLAUDE.md

Personal workflow preferences. Loaded after org-CLAUDE.md and repo-CLAUDE.md. Both win where they conflict.

**Example file** showing what belongs at the engineer level. Adapt the preferences to your own workflow. The structure matters more than the specifics.

**Tradeoff:** This file exists for personal velocity. It cannot make decisions that affect other engineers, shared code, or team workflow. Anything that does belongs in repo-CLAUDE.md.

## 1. Communication Preferences

**Use bullets in PR descriptions. I review faster from bullets.**

- When summarizing what changed, use a bulleted list.
- When suggesting variable names, prefer descriptive over short. `user_subscription_status`, not `uss`.

## 2. Local Workflow

**`make test-fast` is enough for the local loop. CI runs the full suite.**

- Do not run the full test suite locally during development unless I ask.
- When running interactively, show me the proposed refactor diff and wait for my confirmation. In headless mode, proceed and document the change in the PR description.

## 3. What This File Cannot Do

**If a rule here would affect shared code or the team, the rule belongs elsewhere.**

This file does not override:
- Any security rule in org-CLAUDE.md.
- Any stack choice in repo-CLAUDE.md, including pytest, `internal/http`, or `data/repositories/`.
- Any review gate or escalation path.

If I try to add a rule here that affects code other engineers will touch, stop. Tell me the rule belongs in repo-CLAUDE.md instead, and that I should raise it with the repo lead.

Download the starter files: org-CLAUDE.md, repo-CLAUDE.md, engineer-CLAUDE.md.

These 3 files together encode the precedence Claude Code does not enforce. The org file states the non negotiables. The repo file inherits them by writing them into its own preamble. The engineer file inherits both by writing its boundary into Section 3. The agent reads all 3 and sees a coherent system. The files make precedence explicit. Tooling still has to distribute them, detect drift, and turn the highest risk rules into checks.

A concrete signal that the standard is working. When a new engineer’s PR in their first week matches the scope and shape of a senior engineer’s PR, the standard is functioning. When it does not, the standard is aspirational and the team has not finished the work.

Why This Matters Now

The community’s pull requests show the ceiling. Advisory hooks proposed. Headless mode patches proposed. Translations expanding adoption faster than governance can keep up. Files for Codex, Gemini CLI, and other agents arriving as ports of the original. Developers are reaching past plain instructions because plain instructions stopped scaling.

Fiona Fung leads Claude Code engineering and product at Anthropic. In her Code with Claude 2026 session on running an engineering team in the new model, she walked through what happens at the organizational level when coding stops being the bottleneck. For years, engineering bandwidth was the expensive resource. Every team norm grew up around that assumption. Planning rituals. Code ownership. Code review. Onboarding. The bottlenecks have moved. Verification. Review. Alignment across functions. Security. Most teams have not rebuilt the norms that rested on the old constraint.

The teams that build the governance deliberately now will compound an advantage over teams that patch it together after the first production incident caused by agent drift. The post incident review is a bad time to define which file wins, who owns the standard, or how an agent should behave in headless mode.

The window is open. The public repo has stayed coherent for 5 months on the strength of a constrained governance surface that engineering organizations do not inherit by forking. The teams reacting to the production incident later will spend 3 sprints undoing what 1 governance decision could have prevented.

CLAUDE.md governance is ongoing practice. Writing the first version is the easiest part. Keeping it correct as model capabilities shift, team composition changes, and the boundaries between repos move is the work that compounds.

At Clixlogix we work with engineering teams designing this standard from scratch and migrating from informal setups to designed ones. Our AI software development practice covers the governance design alongside the implementation. If your team is early in this conversation, we are happy to compare notes.

Written By

Akhilesh T. Head Software Engineering @ Clixlogix

About the Author:

Akhilesh leads architecture on projects where customer communication, CRM logic, and AI-driven insights converge. He specializes in agentic AI workflows and middleware orchestration, bringing “less guesswork, more signal” mindset to each project, ensuring every integration is fast, scalable, and deeply aligned with how modern teams operate.

Just Drop Us A Line!

We are here to answer your questions 24/7

Karpathy’s CLAUDE.md Works for 1 Engineer. Here’s What Engineering Teams Actually Need.

What Karpathy Actually Said, and What the File Solves

Where Karpathy’s CLAUDE.md Stops Working at the Team Level

Scenario 1: The Drift Problem

Scenario 2: The Shared Service Problem

Scenario 3: The Onboarding Problem

Scenario 4: The Enforcement Ceiling

Scenario 5: The Governance Signals the File Does Not Show

What the Organization Actually Needs Beyond Karpathy’s CLAUDE.md

Gap 1: Onboarding

Gap 2: Versioning

Gap 3: Enforcement

Gap 4: Conflict Resolution

What a Properly Designed CLAUDE.md Engineering Team Standard Looks Like

Why This Matters Now

Written By

Just Drop Us A Line!

About

Services

Solutions

Resources

Karpathy’s CLAUDE.md Works for 1 Engineer. Here’s What Engineering Teams Actually Need.

What Karpathy Actually Said, and What the File Solves

Where Karpathy’s CLAUDE.md Stops Working at the Team Level

Scenario 1: The Drift Problem

Scenario 2: The Shared Service Problem

Scenario 3: The Onboarding Problem

Scenario 4: The Enforcement Ceiling

Scenario 5: The Governance Signals the File Does Not Show

What the Organization Actually Needs Beyond Karpathy’s CLAUDE.md

Gap 1: Onboarding

Gap 2: Versioning

Gap 3: Enforcement

Gap 4: Conflict Resolution

What a Properly Designed CLAUDE.md Engineering Team Standard Looks Like

Why This Matters Now

Written By

Just Drop Us A Line!

Related blogs

Vibe Code Cleanup and the Discipline of Letting Go

The No Nonsense Guide to Zia Models, Zia Agents, and Zoho MCP for Business Leaders Ready to Start with AI

Shopify Zoho Desk Integration Setup and the 9 Mistakes That Break It