andrej-karpathy-skills - 改善 Claude Code 行为的指南

约 1584 字大约 5 分钟

Claude CodeAI 工具开发规范LLM

2026-04-14

简介

andrej-karpathy-skills 是一个由 forrestchang 开发的 Claude Code 增强插件，源自 Andrej Karpathy 关于 LLM 编码陷阱的观察和见解。该项目通过提供四个核心原则，帮助 AI 编码助手避免常见错误，提升代码质量和开发效率。

截至目前，该项目已获得 25,757 Stars 和 2,086 Forks，是 Claude Code 生态中最受欢迎的工具之一。

解决什么问题？

Andrej Karpathy 在其 Twitter 帖子中指出，LLM 在编程时存在几个典型问题：

问题	描述
错误假设	模型擅自做出假设而不验证，不管理自己的困惑
过度复杂	喜欢过度复杂化代码和 API，制造冗余抽象
副作用修改	会修改或删除不充分理解的内容，包括正交任务的代码
不展示权衡	不呈现多种方案，不在必要时 push back

本项目通过四个原则直接解决这些问题：

Think Before Coding - 明确假设、呈现多种解释、在困惑时提问
Simplicity First - 最少代码解决问题，不做投机性功能
Surgical Changes - 只改必须改的，不做"顺手"优化
Goal-Driven Execution - 定义成功标准，循环验证直到达成

核心概念

四个原则概览

原则	核心观点	解决的问题
Think Before Coding	不要假设，不要隐藏困惑，展示权衡	错误假设、隐藏问题
Simplicity First	最少代码解决问题，不投机	过度工程、冗余抽象
Surgical Changes	只碰必须改的，清理自己造成的孤儿	正交修改、附带更改
Goal-Driven Execution	定义成功标准，循环验证	模糊目标、无法验证

安装方式

方式 A：Claude Code 插件（推荐）

# 添加 marketplace
/plugin marketplace add forrestchang/andrej-karpathy-skills

# 安装插件
/plugin install andrej-karpathy-skills@karpathy-skills

方式 B：CLAUDE.md（项目级）

新项目：

curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

已有项目（追加）：

echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

使用方式

原则 1：Think Before Coding

不要假设，不要隐藏困惑，展示权衡。

## Think Before Coding

Before implementing:
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.

示例：用户请求"添加导出用户数据功能"

LLM 常犯的错误：

假设导出所有用户（需要分页？隐私？）
假设文件位置
假设包含哪些字段

正确做法是先明确：

1. **范围**: 导出所有用户还是有筛选条件的子集？
2. **格式**: 下载文件、后台任务、还是 API 接口？
3. **字段**: 包含哪些用户字段？（某些可能敏感）
4. **数据量**: 通常多少用户？（影响实现方式）

原则 2：Simplicity First

最小代码解决问题，不做投机性功能。

## Simplicity First

- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.

反例：简单折扣计算被过度工程化

# 过度设计的版本（30+ 行）
from abc import ABC, abstractmethod
class DiscountStrategy(ABC):
    @abstractmethod
    def calculate(self, amount: float) -> float:
        pass
# ... 完整的策略模式实现

# 正确做法
def calculate_discount(amount: float, percent: float) -> float:
    """Calculate discount amount. percent should be 0-100."""
    return amount * (percent / 100)

原则 3：Surgical Changes

只碰必须改的，清理自己造成的孤儿代码。

## Surgical Changes

When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.

When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.

判断标准： 每个修改的行都应该能直接追溯到用户请求。

原则 4：Goal-Driven Execution

定义成功标准，循环验证直到达成。

## Goal-Driven Execution

Transform tasks into verifiable goals:
- "Add validation" → "Write tests for invalid inputs, then make them pass"
- "Fix the bug" → "Write a test that reproduces it, then make it pass"
- "Refactor X" → "Ensure tests pass before and after"

For multi-step tasks, state a brief plan:
1. [Step] → verify: [check]
2. [Step] → verify: [check]

反例 vs 正例：

反例	正例
"修复认证系统"	"写测试：改密码后旧 session 失效 → 验证测试失败 → 修复 → 验证测试通过"
"添加限流"	"分步：1. 内存限流（单端点）→ 验证 → 2. 中间件 → 验证 → 3. Redis → 验证"

实际示例

多步骤任务规划示例

用户请求："为 API 添加速率限制"

正确规划：

Plan for rate limiting:

1. Add basic in-memory rate limiting (single endpoint)
   Verify:
   - Test: 100 requests → first 10 succeed, rest get 429
   - Manual: curl endpoint 11 times, see rate limit error

2. Extract to middleware (apply to all endpoints)
   Verify:
   - Test: Rate limits apply to /users and /posts
   - Existing endpoint tests still pass

3. Add Redis backend (for multi-server)
   Verify:
   - Test: Rate limit persists across app restarts

4. Add configuration (rates per endpoint)
   Verify:
   - Test: /search allows 10/min, /users allows 100/min

Each step is independently verifiable and deployable.
Start with step 1?

Bug 修复示例

用户报告："排序在有重复分数时出问题"

正确流程：

# 1. 先写测试重现问题
def test_sort_with_duplicate_scores():
    scores = [
        {'name': 'Alice', 'score': 100},
        {'name': 'Bob', 'score': 100},
        {'name': 'Charlie', 'score': 90},
    ]
    result = sort_scores(scores)
    # Bug: 重复分数的顺序不确定
    assert result[0]['score'] == 100
    assert result[1]['score'] == 100

# 验证：运行 10 次测试 → 顺序不一致（重现 bug）

# 2. 用稳定排序修复
def sort_scores(scores):
    return sorted(scores, key=lambda x: (-x['score'], x['name']))

# 验证：测试稳定通过

如何更好地利用

与项目规范结合

这些指南设计为与项目特定说明合并使用。在现有 CLAUDE.md 中添加：

## Project-Specific Guidelines

- Use TypeScript strict mode
- All API endpoints must have tests
- Follow the existing error handling patterns in `src/utils/errors.ts`

适用场景判断

这些指南偏向谨慎而非速度。对于简单任务（typo 修复、明显的一行代码），使用判断力——不是每个更改都需要完整流程。

目标是减少非平凡工作中的高成本错误，而非拖慢简单任务。

效果验证

使用这些指南后应该看到：

Diff 更干净 - 只有请求的更改出现
减少过度重写 - 代码第一次就足够简单
澄清问题在实现前 - 而非错误后
PR 更干净 - 无顺手的重构或"改进"