io.github.userFRM/rpg-encoder

编码与调试

by userfrm

基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。

什么是 io.github.userFRM/rpg-encoder

基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。

README

<h1 align="center">rpg-encoder</h1> <p align="center"> <strong>Give your AI agent a brain for your codebase.</strong> </p> <p align="center"> <a href="https://github.com/userFRM/rpg-encoder/actions"><img src="https://github.com/userFRM/rpg-encoder/workflows/CI/badge.svg" alt="CI"></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square" alt="MIT License"></a> <a href="https://www.rust-lang.org"><img src="https://img.shields.io/badge/rust-1.85%2B-orange.svg?style=flat-square" alt="Rust 1.85+"></a> <a href="https://www.npmjs.com/package/rpg-encoder"><img src="https://img.shields.io/npm/v/rpg-encoder?style=flat-square" alt="npm"></a> <a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-compatible-green.svg?style=flat-square" alt="MCP"></a> <a href="https://github.com/userFRM/rpg-encoder/stargazers"><img src="https://img.shields.io/github/stars/userFRM/rpg-encoder?style=flat-square" alt="Stars"></a> </p> <br>

AI coding agents waste most of their tool calls fumbling through your codebase with grep, cat, find, and file reads. rpg-encoder fixes that. It builds a semantic graph of your code with Tree-sitter — not just what calls what, but what every function does — and gives your AI assistant whole-repo understanding via MCP in a single tool call.

<p align="center"> <img src="diagrams/hero-tool-waste.webp" alt="Without RPG: 34,000 chaotic grep/cat/find calls. With RPG: one semantic_snapshot call returns a structured map of the whole repo." width="90%" /> </p>

Quick Start

bash
claude mcp add rpg -- npx -y -p rpg-encoder rpg-mcp-server

One command. Works with Claude Code, Cursor, opencode, Windsurf, or any MCP-compatible agent. No Rust toolchain, no cloning, no building — npx downloads a pre-built binary for your platform.

Then open any repo and tell your agent:

"Build and lift the RPG for this repo"

Your agent handles everything: indexes entities (seconds), reads each function and adds intent-level features (a few minutes), organizes them into a semantic hierarchy, and commits .rpg/graph.json for your team.

For repos with ~100+ entities, lifting_status will tell your agent to delegate the lifting loop to a sub-agent or a cheaper model — feature extraction is pattern-matching, not novel reasoning. If your runtime has no sub-agent mechanism, run rpg-encoder lift --provider anthropic|openai from the terminal with an API key — the CLI drives an external LLM directly with no agent involvement. After the CLI finishes, call reload_rpg in your session to load the updated graph. The CLI lifts entities with no features; re-lifting stale entities (features present but outdated after code changes) is handled by the in-session MCP flow, not the CLI.

Once lifted, try:

  • "What handles authentication?" — finds code even when nothing is named "auth"
  • "Show everything that depends on the database connection"
  • "Plan a change to add rate limiting to API endpoints"

Use RPG before grep, cat, find

The server instructions tell your agent to reach for RPG tools FIRST for any question about code structure or behavior. That reflex matters — grep, cat, and ad-hoc file reads burn tokens and miss semantic relationships RPG already knows.

If you'd otherwise reach for...Use this instead
grep -r / rg (by intent)search_node(query="...")
grep -r / rg (by name)search_node(query="...", mode="snippets")
cat / reading a functionfetch_node(entity_id="file:name")
chained greps for callers/calleesexplore_rpg(entity_id="...", direction="...")
recursive grep for "what depends on X"impact_radius(entity_id="...")
wc -l / find / treerpg_info
reading many files for contextsemantic_snapshot
manual search → fetch → explore chainscontext_pack(query="...")
"how do I refactor X safely"plan_change(goal="...")

Fall back to grep, cat, or file reads only when the query is about literal text (string search, comments, TODOs, log messages) — not about structure.


How It Works

<p align="center"> <img src="diagrams/how-it-works.webp" alt="Four-stage pipeline: Parse (tree-sitter) → Lift (verb-object features) → Organize (3-level hierarchy) → Understand (LLM gets full repo knowledge)" width="95%" /> </p>
  1. Parse — Tree-sitter extracts entities (functions, classes, methods) and dependency edges (imports, calls, inheritance) from 15 languages.
  2. Lift — An LLM (your agent, or a cheap API like Haiku) reads each entity and writes verb-object features: "validate JWT tokens", "serialize config to disk".
  3. Organize — Features cluster into a 3-level semantic hierarchy (Area → Category → Subcategory) that emerges from what the code does, not the file tree.
  4. Understandsemantic_snapshot compresses the whole graph into ~25K tokens. Your LLM reads it once and knows the repo.

The semantic snapshot

<p align="center"> <img src="diagrams/semantic-snapshot.webp" alt="The whole repo — ~500K tokens of source — compressed 20x into a ~25K token snapshot containing hierarchy, features, dependencies, and hot spots" width="80%" /> </p>

Instead of grepping through files, the LLM calls semantic_snapshot once and receives:

  • Hierarchy — every functional area with aggregate features
  • Entities — every function, class, method grouped by area, with its semantic features
  • Dependency skeleton — condensed call graph with qualified names
  • Hot spots — top 10 most-connected entities (the architectural backbone)

~25K tokens covers ~1000 entities. That's 2-3% of a 1M context window — the LLM starts every session already knowing your repo.

Self-maintaining graph

<p align="center"> <img src="diagrams/auto-staleness.webp" alt="Git HEAD moves → RPG Server auto-syncs → update_rpg applies additions/modifications/removals → graph always fresh, zero agent action" width="80%" /> </p>

Whenever your working tree changes — committed, staged, or unstaged — the MCP server automatically re-syncs before responding to the next query. A changeset hash over (path, size, mtime) means repeated saves of the same file trigger one sync, and idle queries trigger none. Reverts are detected too: if a previously-dirty file returns to its HEAD state, the graph is restored.

Two ways to lift

ModeCommandCostWho pays
Agent lifting"Build and lift the RPG"Subscription tokensYour Claude Code / Cursor subscription
Autonomous liftingauto_lift(provider="anthropic", api_key_env="ANTHROPIC_API_KEY")~$0.02 per 100 entitiesExternal API key (Haiku, GPT-4o-mini, OpenRouter, Gemini)

auto_lift calls a cheap external LLM directly — your coding subscription never touches the lifting work. Use api_key_env to resolve keys from environment variables so they never appear in tool call transcripts.


Architecture

<p align="center"> <img src="diagrams/architecture.webp" alt="Your codebase (15 languages) → RPG Engine (5 Rust crates: parser, encoder, nav, lift, mcp) → Clients (Claude Code, Cursor, opencode) via MCP Protocol" width="95%" /> </p>

Seven Rust crates, one MCP server binary, one CLI binary:

CrateRole
rpg-coreGraph types (RPGraph, Entity, HierarchyNode), storage, LCA algorithm
rpg-parserTree-sitter entity + dependency extraction (15 languages)
rpg-encoderEncoding pipeline, lifting utilities, incremental evolution
rpg-navSearch, fetch, explore, snapshot, TOON serialization
rpg-liftAutonomous LLM lifting (Anthropic, OpenAI, OpenRouter, Gemini)
rpg-cliCLI binary (rpg-encoder)
rpg-mcpMCP server binary (rpg-mcp-server) with 27 tools

MCP Tools (27)

<details> <summary><strong>Build & Maintain</strong> (4 tools)</summary>
ToolDescription
build_rpgIndex the codebase (run once, instant)
update_rpgIncremental update from git changes
reload_rpgReload graph from disk after external changes
rpg_infoGraph statistics, hierarchy overview, per-area lifting coverage
</details> <details> <summary><strong>Navigate & Search</strong> (5 tools)</summary>
ToolDescription
semantic_snapshotWhole-repo semantic understanding in one call (~25K tokens for 1000 entities)
search_nodeSearch entities by intent or keywords (hybrid embedding + lexical scoring)
fetch_nodeGet entity metadata, source code, dependencies, and hierarchy context
explore_rpgTraverse dependency graph (upstream, downstream, or both)
context_packSingle-call search + fetch + explore with token budget
</details> <details> <summary><strong>Plan & Analyze</strong> (7 tools)</summary>
ToolDescription
impact_radiusBFS reachability analysis — "what depends on X?"
plan_changeChange planning — find relevant entities, modification order, blast radius
find_pathsK-shortest dependency paths between two entities
slice_betweenExtract minimal connecting subgraph between entities
analyze_healthCode health: coupling, instability, god objects, clone detection
detect_cyclesFind circular dependencies and architectural cycles
reconstruct_planDependency-safe reconstruction execution plan
</details> <details> <summary><strong>Semantic Lifting</strong> (11 tools)</summary>
ToolDescription
auto_liftOne-call autonomous lifting via cheap LLM API (Haiku, GPT-4o-mini, OpenRouter, Gemini)
lifting_statusDashboard — coverage, per-area progress, NEXT STEP
get_entities_for_liftingGet entity source code for your agent to analyze
submit_lift_resultsSubmit the agent's semantic features back to the graph
finalize_liftingAggregate file-level features, rebuild hierarchy metadata
get_files_for_synthesisGet file-level entity features for holistic synthesis
submit_file_synthesesSubmit holistic file-level summaries
build_semantic_hierarchyGet domain discovery + hierarchy assignment prompts
submit_hierarchyApply hierarchy assignments to the graph
get_routing_candidatesGet entities needing semantic routing (drifted or newly lifted)
submit_routing_decisionsSubmit routing decisions (hierarchy path or "keep")
</details>

Supported Languages

15 languages via Tree-sitter:

LanguageEntity ExtractionDependency Resolution
PythonFunctions, classes, methodsimports, calls, inheritance
RustFunctions, structs, traits, impl methodsuse, calls, trait impls
TypeScriptFunctions, classes, methods, interfacesimports, calls, inheritance
JavaScriptFunctions, classes, methodsimports, calls, inheritance
GoFunctions, structs, methods, interfacesimports, calls
JavaClasses, methods, interfacesimports, calls, inheritance
C / C++Functions, classes, methods, structsincludes, calls, inheritance
C#Classes, methods, interfacesusing, calls, inheritance
PHPFunctions, classes, methodsuse, calls, inheritance
RubyClasses, methods, modulesrequire, calls, inheritance
KotlinFunctions, classes, methodsimports, calls, inheritance
SwiftFunctions, classes, structs, protocolsimports, calls, inheritance
ScalaFunctions, classes, objects, traitsimports, calls, inheritance
BashFunctionssource, calls

Install

MCP server (recommended)

bash
# Claude Code
claude mcp add rpg -- npx -y -p rpg-encoder rpg-mcp-server

# Cursor — add to ~/.cursor/mcp.json
{
  "mcpServers": {
    "rpg": {
      "command": "npx",
      "args": ["-y", "-p", "rpg-encoder", "rpg-mcp-server"]
    }
  }
}

The server auto-detects the project root from the current working directory — no path argument needed.

<details> <summary><strong>CLI</strong></summary>
bash
npm install -g rpg-encoder

# Build a graph
rpg-encoder build

# Query
rpg-encoder search "parse entities from source code"
rpg-encoder fetch "src/parser.rs:extract_entities"
rpg-encoder explore "src/parser.rs:extract_entities" --direction both --depth 2
rpg-encoder info

# Autonomous lifting via API
rpg-encoder lift --provider anthropic --dry-run  # estimate cost
rpg-encoder lift --provider anthropic           # lift with Haiku (~$0.02/100 entities)

# Incremental update
rpg-encoder update

# Pre-commit hook (auto-updates graph on commit)
rpg-encoder hook install
</details> <details> <summary><strong>Build from source</strong></summary>
bash
git clone https://github.com/userFRM/rpg-encoder.git
cd rpg-encoder && cargo build --release

Then point your MCP config at target/release/rpg-mcp-server.

</details>

Documentation

  • How RPG Compares — honest comparison with GitNexus, Serena, Repomix, and others
  • Paper Fidelity — algorithm-by-algorithm comparison with the research paper
  • Use Cases — practical examples of what RPG enables
  • CHANGELOG — release history

Inspirations & References

rpg-encoder is built on the theoretical framework from the RPG-Encoder research paper, with original extensions inspired by tools across the code intelligence landscape:

  • RPG-Encoder paper (Luo et al., 2026, Microsoft Research) — semantic lifting model, 3-level hierarchy construction, incremental evolution algorithms, formal graph model G = (V_H ∪ V_L, E_dep ∪ E_feature).
  • GitNexus — precomputed relational intelligence, blast radius analysis, Claude Code hooks. Showed that a code graph tool must be invisible to be essential.
  • Serena — symbol-level precision via LSP. Demonstrated that real-time code awareness matters more than batch analysis.
  • TOON — Token-Oriented Object Notation for LLM-optimized output.

This is an independent implementation. All code is original work under the MIT license. Not affiliated with or endorsed by Microsoft.


License

MIT

常见问题

io.github.userFRM/rpg-encoder 是什么?

基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描152.6k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描152.6k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描152.6k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
87.4k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
57.7k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
48.0k

评论