io.github.ofershap/ai-context-kit

编码与调试

by ofershap

支持在 Cursor、Claude Code、Copilot 间对 AI context 文件执行 lint、度量与同步

什么是 io.github.ofershap/ai-context-kit

支持在 Cursor、Claude Code、Copilot 间对 AI context 文件执行 lint、度量与同步

README

<p align="center"> <img src="assets/logo.png" alt="ai-context-kit" width="120" height="120" /> </p> <h1 align="center">ai-context-kit</h1> <p align="center"> <strong>How do you measure the token cost of your context?</strong> </p> <p align="center"> You spent hours writing the perfect .md context file, just to find out that your agent got <em>worse</em>.<br> That's not a bug. That's what happens when nobody measures the cost of context. </p> <p align="center"> <a href="#quick-start"><img src="https://img.shields.io/badge/Try_It_Now-22c55e?style=for-the-badge&logoColor=white" alt="Try It Now" /></a> &nbsp; <a href="#quick-start"><img src="https://img.shields.io/badge/Install-3b82f6?style=for-the-badge&logoColor=white" alt="Install" /></a> &nbsp; <a href="#quick-start"><img src="https://img.shields.io/badge/See_Example_Output-8b5cf6?style=for-the-badge&logoColor=white" alt="See Example Output" /></a> &nbsp; <a href="https://github.com/ofershap/ai-context-kit/discussions/1"><img src="https://img.shields.io/badge/Vote_on_Next_Features-f97316?style=for-the-badge&logoColor=white" alt="Vote on Next Features" /></a> </p> <p align="center"> <a href="https://github.com/ofershap/ai-context-kit/stargazers"><img src="https://img.shields.io/github/stars/ofershap/ai-context-kit?style=social" alt="GitHub stars" /></a> &nbsp; <a href="https://www.npmjs.com/package/ai-context-kit"><img src="https://img.shields.io/npm/v/ai-context-kit.svg" alt="npm version" /></a> <a href="https://www.npmjs.com/package/ai-context-kit"><img src="https://img.shields.io/npm/dm/ai-context-kit.svg" alt="npm downloads" /></a> <a href="https://github.com/ofershap/ai-context-kit/actions/workflows/ci.yml"><img src="https://github.com/ofershap/ai-context-kit/actions/workflows/ci.yml/badge.svg" alt="CI" /></a> <a href="https://www.typescriptlang.org/"><img src="https://img.shields.io/badge/TypeScript-strict-blue" alt="TypeScript" /></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT" /></a> <a href="https://makeapullrequest.com"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg" alt="PRs Welcome" /></a> </p>

You write a CLAUDE.md. Then someone adds .cursor/rules/. Then a teammate drops in an AGENTS.md. Then someone copies in a .cursorrules file from a blog post. Nobody removes the old ones.

Six months later your project has four context files that overlap, contradict each other, and dump 8,000 tokens of directory listings and "follow best practices" into every conversation. Your agent follows all of it. It gets slower. It gets confused. You blame the model.

An ETH Zurich study (February 2026) measured what actually happens when you give agents context files:

  • Auto-generated context files reduced task success compared to providing nothing
  • Human-written ones only improved accuracy by 4%
  • Inference costs jumped 20%+ from wasted tokens
  • Performance dropped on some models because agents got too obedient - following unnecessary instructions instead of solving the actual problem

I kept hitting this in my own projects, so I built ai-context-kit - a toolkit to treat context like a budget. Measure it, trim it, inject only what the current task needs.

typescript
import { loadRules, measure, lint, select } from "ai-context-kit";

const rules = await loadRules("./");

measure(rules, 4000); // what does your context cost?
lint(rules); // conflicts? duplicates? dead weight?
select(rules, {
  task: "fix auth bug", // only inject what matters
  budget: 2000, // stay within token budget
});

Quick Start

bash
npm install ai-context-kit

Run the CLI on any project to see what you're actually injecting:

bash
npx ai-context-kit measure
code
ai-context-kit measure - 6 rule file(s)

  Total: 4,821 tokens

  ############ 2,100 tokens (44%) - .cursor/rules/conventions.mdc
  ######## 1,200 tokens (25%) - CLAUDE.md
  ##### 890 tokens (18%) - .cursor/rules/api-patterns.mdc
  ## 340 tokens (7%) - AGENTS.md
  ## 180 tokens (4%) - .cursor/rules/testing.mdc
  # 111 tokens (2%) - .github/copilot-instructions.md

Then lint it:

bash
npx ai-context-kit lint
code
ai-context-kit lint - 6 rule file(s)

  [!] .cursor/rules/conventions.mdc
      Rule is 2100 tokens. Consider splitting to keep each file under 2000 tokens.

  [x] CLAUDE.md
      Conflicts with AGENTS.md: "always use semicolons" vs "never use semicolons"

  [!] CLAUDE.md
      Duplicated line also found in .cursor/rules/conventions.mdc. Duplicates waste tokens.

  [i] AGENTS.md
      Contains vague instruction matching "follow best practices".
      Specific instructions produce better results than general advice.

  Score: 70/100 (FAILED)

That's the difference between guessing and knowing.


What's Different

Other approachesai-context-kit
Context costNobody measures itToken count per file with budget check
ConflictsYou find out when the agent does something weirdDetects contradictions across all files automatically
DuplicatesSame rule in 3 files, 3x the tokensFlagged and scored
Task relevanceEvery rule injected every timeselect() picks only what matters for the current task
Multi-toolLocked to one IDE's formatWorks across Cursor, Claude Code, Copilot, Windsurf, Cline
CIHope for the bestlint exits with code 1 on errors. Drop it in your pipeline

What This Answers

  1. How much context am I injecting? Token count per file, percentage breakdown, budget check
  2. Are my rules fighting each other? Conflict detection across all files and formats
  3. What's wasting tokens? Directory listings, duplicate content, vague advice
  4. Which rules matter for this task? Task-relevant selection with token budget

How It Works

ai-context-kit reads every context file format in the ecosystem, parses frontmatter, estimates token cost, and gives you tools to analyze and manage them.

loadRules()Auto-detects .cursor/rules/, .cursorrules, CLAUDE.md, AGENTS.md, copilot-instructions.md, .windsurfrules, .clinerules
measure()Token cost per rule, percentage of total, budget check
lint()Conflicts, duplicates, bloat, vague instructions, useless directory trees. Scores 0-100
select()Picks rules relevant to the current task. Respects a token budget. alwaysApply rules first, then by relevance
sync()Single source of truth. Write once in .cursor/rules/, sync to CLAUDE.md, AGENTS.md, and the rest
init()Starter template with tips from the research

API

loadRules(rootDir?)

typescript
const rules = await loadRules("./");
// Finds every context file in the project

const rules = await loadRules(".cursor/rules/");
// Or load from a specific directory

Returns RuleFile[] with parsed frontmatter, body, format, path, and token count.

measure(rules, budget?)

typescript
const report = measure(rules, 4000);

report.totalTokens; // 3847
report.overBudget; // false
report.rules; // sorted by size, each with tokens + percentage

lint(rules)

typescript
const report = lint(rules);

report.score; // 85/100
report.passed; // true (no errors, warnings don't fail)
report.issues; // array of { rule, path, severity, message }

What the linter catches:

RuleSeverityWhat it finds
token-budgetwarning/errorFiles over 2,000 tokens (warning) or 5,000 (error)
empty-rulewarningFiles too short to do anything
duplicate-contentwarningSame instruction repeated across files
conflicterror"always use X" in one file, "never use X" in another
directory-listingwarning10+ line directory trees that agents don't need
vague-instructioninfo"follow best practices", "write clean code", "be consistent"

select(rules, options)

The core insight from the research: don't inject everything. Pick what matters.

typescript
const relevant = select(rules, {
  task: "fix auth bug in /api/auth",
  budget: 2000,
  tags: ["security", "api"],
  exclude: ["style"],
});

Scoring: alwaysApply: true in frontmatter gets highest priority. Then task words matched against file paths and content. Then tag matches. Budget is respected - highest-scored rules are included first until the budget runs out.

sync(options)

Write rules once, sync everywhere.

typescript
await sync({
  source: ".cursor/rules/",
  targets: ["CLAUDE.md", "AGENTS.md", ".github/copilot-instructions.md"],
});

Supports dryRun: true to preview changes without writing.

init(options?)

typescript
await init({ format: "cursor-rules" });
// Creates .cursor/rules/conventions.mdc with research-backed starter template

CLI

bash
npx ai-context-kit lint                    # find issues
npx ai-context-kit lint --json             # machine-readable output
npx ai-context-kit measure                 # token cost breakdown
npx ai-context-kit measure --budget 4000   # check against budget
npx ai-context-kit sync --source .cursor/rules/ --target CLAUDE.md,AGENTS.md
npx ai-context-kit init                    # scaffold starter rules
npx ai-context-kit init --format claude-md

All commands support --path <dir> to point at a different project root. lint exits with code 1 on errors (warnings pass).


Use with Vercel AI SDK / LangChain / Custom Agents

This isn't just for Cursor. If you're building agents with Vercel AI SDK, LangChain, or your own framework, ai-context-kit solves the same problem: how much context are you stuffing into the system prompt, and is it helping or hurting?

typescript
import { loadRules, select } from "ai-context-kit";
import { generateText } from "ai";

const allRules = await loadRules("./rules");

const relevant = select(allRules, {
  task: userMessage,
  budget: 3000,
});

const systemPrompt = relevant.map((r) => r.body).join("\n\n");

const { text } = await generateText({
  model: openai("gpt-4o"),
  system: systemPrompt,
  prompt: userMessage,
});

Any framework that takes a system prompt string. Any rules stored as markdown files.


Supported Formats

FormatFileUsed by
Cursor (modern).cursor/rules/*.mdcCursor IDE
Cursor (legacy).cursorrulesCursor IDE
Claude CodeCLAUDE.mdClaude Code
AGENTS.mdAGENTS.mdCross-agent standard
GitHub Copilot.github/copilot-instructions.mdGitHub Copilot
Windsurf.windsurfrulesWindsurf
Cline.clinerulesCline

ai-context-kit detects the format from the file path. No configuration needed.


<details> <summary><strong>Why not just write better rules?</strong></summary>

The ETH Zurich study tested both human-written and LLM-generated context files. Human-written ones were better, but only by 4%. The real problem isn't quality - it's volume. More context means more tokens consumed by instructions the agent doesn't need for the current task. The winning strategy is fewer, task-relevant rules, not better prose.

</details> <details> <summary><strong>How accurate is the token estimation?</strong></summary>

ai-context-kit uses a 4-character-per-token approximation. This is intentionally simple and fast. It's accurate enough for budgeting and comparison (GPT-4 averages ~4 chars/token for English text). If you need exact counts, pipe the output through tiktoken or your model's tokenizer.

</details> <details> <summary><strong>Does this work in CI?</strong></summary>

Yes. npx ai-context-kit lint returns exit code 1 on errors, 0 on pass. Add it to your CI pipeline the same way you'd add eslint. The --json flag gives machine-readable output for custom reporting.

</details>

Tech Stack

ComponentTechnology
LanguageTypeScript strict mode
TestingVitest
Bundlertsup ESM + CJS
DependenciesZero runtime dependencies

Contributing

PRs welcome. Whether it's a new lint rule, a format detector, or a bug fix - check out the contributing guide.


Author

Made by ofershap

LinkedIn GitHub


<sub>README built with README Builder</sub>

License

MIT © Ofer Shapira


<p align="center"> <a href="https://github.com/ofershap/ai-context-kit">Star this repo</a> · <a href="https://github.com/ofershap/ai-context-kit/fork">Fork it</a> · <a href="https://github.com/ofershap/ai-context-kit/issues">Report a bug</a> · <a href="https://github.com/ofershap/ai-context-kit/discussions">Join the discussion</a> </p>

常见问题

io.github.ofershap/ai-context-kit 是什么?

支持在 Cursor、Claude Code、Copilot 间对 AI context 文件执行 lint、度量与同步

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描111.8k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描111.8k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描111.8k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
83.1k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
51.8k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.2k

评论