Karpathy Skills
forrestchang/andrej-karpathy-skills — a single CLAUDE.md with 4 rules that directly address Andrej Karpathy’s observations on where LLM coders fall down. Distributed as a Claude Code plugin or for appending to an existing CLAUDE.md.
đź”— Links
Description
- Repo: https://github.com/forrestchang/andrej-karpathy-skills
- Karpathy original tweet: https://x.com/karpathy/status/2015883857489522876
- Cursor variant: https://github.com/forrestchang/andrej-karpathy-skills/blob/main/CURSOR.md
Download or use
# Claude Code plugin (recommended)
/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills
# Per-project (CLAUDE.md)
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
# Append to existing
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md🗒️ Description
đź§© What LLMs get wrong (per Karpathy)
- Wrong assumptions, no clarification — models silently pick an interpretation, don’t ask, don’t surface doubt
- Overengineering — bloated abstractions, 1000 lines instead of 100, dead code stays
- Side-effect edits — they change comments and code they don’t understand, even when orthogonal to the task
- Weak success criteria — without a “make it work” criterion, manual validation is required
đź§© The four rules
| Principle | Addresses |
|---|---|
| Think Before Coding | Wrong assumptions, hidden confusion, missing tradeoffs |
| Simplicity First | Overcomplication, bloated abstractions |
| Surgical Changes | Orthogonal edits, touching code you shouldn’t |
| Goal-Driven Execution | Leverage przez tests-first i verifiable success criteria |
1. Think Before Coding
Stop. State assumptions explicitly. Present multiple interpretations when there’s ambiguity. Push back when a simpler solution exists. Stop & ask when confused.
2. Simplicity First
- No features beyond what was asked
- No abstractions for single-use code
- No “flexibility” that wasn’t requested
- No error handling for impossible scenarios
- 200 lines → 50? Rewrite.
Test: Would a senior engineer say this is overcomplicated? If yes, simplify.
3. Surgical Changes
- Don’t “improve” adjacent code/comments/formatting
- Match existing style, even if you’d do it differently
- Notice unrelated dead code → mention, don’t delete
- Remove imports/vars/funcs that YOUR changes orphaned; don’t remove pre-existing dead code
Test: Every changed line should trace directly to the user’s request.
4. Goal-Driven Execution
Transform imperative tasks into declarative goals with verification:
| Instead of… | Use… |
|---|---|
| ”Add validation" | "Write tests for invalid inputs, then make them pass" |
| "Fix the bug" | "Write a test that reproduces it, then make it pass" |
| "Refactor X" | "Ensure tests pass before and after” |
Multi-step plan:
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
Karpathy: “LLMs are exceptionally good at looping until they meet specific goals… Don’t tell it what to do, give it success criteria and watch it go.”
🧩 Signs it’s working
- Fewer unnecessary changes in diffs (only what was requested)
- Code is simple from the start, not after a refactor
- Clarifying questions before implementation, not after mistakes
- Clean, minimal PRs, no drive-by refactoring
đź§© Tradeoff
The guidelines bias toward caution over speed. For trivial tasks (typo fix, one-liner) — relax the full rigor. The goal is to limit costly mistakes on non-trivial work, not to slow everything down.
✍️ Reasoning for
From my perspective this is a good baseline CLAUDE.md to bolt on next to existing project-specific instructions. Three of the four rules overlap with what I already do manually (“don’t add features beyond what’s asked”, “match existing style”, “surgical changes”). The fourth — Goal-Driven — is exactly what’s missing in most of my prompts: instead of “add X”, I write “write a test that fails when X is missing, then make it pass”. That turns the LLM from an executor into an autonomous worker.
For my setup it’s worth cherry-picking rule #4 into my main ~/.claude/CLAUDE.md, since rules #1–3 are already partially covered via Context Engineering and Claude Code Best Practice. A structural alternative that enforces these rules through workflow gates — see Archon.
Alternatives considered
- CLAUDE.md from scratch — more tailored, but more effort
- Awesome Claude Code — resource curation, but it’s a catalog, not a ready-made file
- Agent Skills — skills are more granular (load on demand), CLAUDE.md is always-on
đź”— Resources
- Karpathy observations (X): https://x.com/karpathy/status/2015883857489522876
- Multica (related project by the author, Jiayuan): https://github.com/multica-ai/multica
- Multica-ai mirror of the skills repo: https://github.com/multica-ai/andrej-karpathy-skills
- Author X handle: https://x.com/jiayuan_jy
- License: MIT
- Superpowers — a methodology framework that enforces Karpathy-style discipline via 7 mandatory skills
- gstack — the opposite extreme (23+ opinionated role-skills instead of 1 CLAUDE.md)
Template: tool