io.github.userFRM/rpg-encoder
编码与调试by userfrm
基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。
什么是 io.github.userFRM/rpg-encoder?
基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。
README
AI coding agents waste most of their tool calls fumbling through your codebase with grep, cat, find, and file reads. rpg-encoder fixes that. It builds a semantic graph of your code with Tree-sitter — not just what calls what, but what every function does — and gives your AI assistant whole-repo understanding via MCP in a single tool call.
Quick Start
claude mcp add rpg -- npx -y -p rpg-encoder rpg-mcp-server
One command. Works with Claude Code, Cursor, opencode, Windsurf, or any MCP-compatible agent. No Rust toolchain, no cloning, no building — npx downloads a pre-built binary for your platform.
Then open any repo and tell your agent:
"Build and lift the RPG for this repo"
Your agent handles everything: indexes entities (seconds), reads each function and adds intent-level features (a few minutes), organizes them into a semantic hierarchy, and commits .rpg/graph.json for your team.
For repos with ~100+ entities, lifting_status will tell your agent to delegate the lifting loop to a sub-agent or a cheaper model — feature extraction is pattern-matching, not novel reasoning. If your runtime has no sub-agent mechanism, run rpg-encoder lift --provider anthropic|openai from the terminal with an API key — the CLI drives an external LLM directly with no agent involvement. After the CLI finishes, call reload_rpg in your session to load the updated graph. The CLI lifts entities with no features; re-lifting stale entities (features present but outdated after code changes) is handled by the in-session MCP flow, not the CLI.
Once lifted, try:
- "What handles authentication?" — finds code even when nothing is named "auth"
- "Show everything that depends on the database connection"
- "Plan a change to add rate limiting to API endpoints"
Use RPG before grep, cat, find
The server instructions tell your agent to reach for RPG tools FIRST for any
question about code structure or behavior. That reflex matters — grep, cat,
and ad-hoc file reads burn tokens and miss semantic relationships RPG already
knows.
| If you'd otherwise reach for... | Use this instead |
|---|---|
grep -r / rg (by intent) | search_node(query="...") |
grep -r / rg (by name) | search_node(query="...", mode="snippets") |
cat / reading a function | fetch_node(entity_id="file:name") |
| chained greps for callers/callees | explore_rpg(entity_id="...", direction="...") |
| recursive grep for "what depends on X" | impact_radius(entity_id="...") |
wc -l / find / tree | rpg_info |
| reading many files for context | semantic_snapshot |
| manual search → fetch → explore chains | context_pack(query="...") |
| "how do I refactor X safely" | plan_change(goal="...") |
Fall back to grep, cat, or file reads only when the query is about literal text
(string search, comments, TODOs, log messages) — not about structure.
How It Works
<p align="center"> <img src="diagrams/how-it-works.webp" alt="Four-stage pipeline: Parse (tree-sitter) → Lift (verb-object features) → Organize (3-level hierarchy) → Understand (LLM gets full repo knowledge)" width="95%" /> </p>- Parse — Tree-sitter extracts entities (functions, classes, methods) and dependency edges (imports, calls, inheritance) from 15 languages.
- Lift — An LLM (your agent, or a cheap API like Haiku) reads each entity and writes verb-object features: "validate JWT tokens", "serialize config to disk".
- Organize — Features cluster into a 3-level semantic hierarchy (Area → Category → Subcategory) that emerges from what the code does, not the file tree.
- Understand —
semantic_snapshotcompresses the whole graph into ~25K tokens. Your LLM reads it once and knows the repo.
The semantic snapshot
<p align="center"> <img src="diagrams/semantic-snapshot.webp" alt="The whole repo — ~500K tokens of source — compressed 20x into a ~25K token snapshot containing hierarchy, features, dependencies, and hot spots" width="80%" /> </p>Instead of grepping through files, the LLM calls semantic_snapshot once and receives:
- Hierarchy — every functional area with aggregate features
- Entities — every function, class, method grouped by area, with its semantic features
- Dependency skeleton — condensed call graph with qualified names
- Hot spots — top 10 most-connected entities (the architectural backbone)
~25K tokens covers ~1000 entities. That's 2-3% of a 1M context window — the LLM starts every session already knowing your repo.
Self-maintaining graph
<p align="center"> <img src="diagrams/auto-staleness.webp" alt="Git HEAD moves → RPG Server auto-syncs → update_rpg applies additions/modifications/removals → graph always fresh, zero agent action" width="80%" /> </p>Whenever your working tree changes — committed, staged, or unstaged — the MCP server automatically re-syncs before responding to the next query. A changeset hash over (path, size, mtime) means repeated saves of the same file trigger one sync, and idle queries trigger none. Reverts are detected too: if a previously-dirty file returns to its HEAD state, the graph is restored.
Two ways to lift
| Mode | Command | Cost | Who pays |
|---|---|---|---|
| Agent lifting | "Build and lift the RPG" | Subscription tokens | Your Claude Code / Cursor subscription |
| Autonomous lifting | auto_lift(provider="anthropic", api_key_env="ANTHROPIC_API_KEY") | ~$0.02 per 100 entities | External API key (Haiku, GPT-4o-mini, OpenRouter, Gemini) |
auto_lift calls a cheap external LLM directly — your coding subscription never touches the lifting work. Use api_key_env to resolve keys from environment variables so they never appear in tool call transcripts.
Architecture
<p align="center"> <img src="diagrams/architecture.webp" alt="Your codebase (15 languages) → RPG Engine (5 Rust crates: parser, encoder, nav, lift, mcp) → Clients (Claude Code, Cursor, opencode) via MCP Protocol" width="95%" /> </p>Seven Rust crates, one MCP server binary, one CLI binary:
| Crate | Role |
|---|---|
rpg-core | Graph types (RPGraph, Entity, HierarchyNode), storage, LCA algorithm |
rpg-parser | Tree-sitter entity + dependency extraction (15 languages) |
rpg-encoder | Encoding pipeline, lifting utilities, incremental evolution |
rpg-nav | Search, fetch, explore, snapshot, TOON serialization |
rpg-lift | Autonomous LLM lifting (Anthropic, OpenAI, OpenRouter, Gemini) |
rpg-cli | CLI binary (rpg-encoder) |
rpg-mcp | MCP server binary (rpg-mcp-server) with 27 tools |
MCP Tools (27)
<details> <summary><strong>Build & Maintain</strong> (4 tools)</summary>| Tool | Description |
|---|---|
build_rpg | Index the codebase (run once, instant) |
update_rpg | Incremental update from git changes |
reload_rpg | Reload graph from disk after external changes |
rpg_info | Graph statistics, hierarchy overview, per-area lifting coverage |
| Tool | Description |
|---|---|
semantic_snapshot | Whole-repo semantic understanding in one call (~25K tokens for 1000 entities) |
search_node | Search entities by intent or keywords (hybrid embedding + lexical scoring) |
fetch_node | Get entity metadata, source code, dependencies, and hierarchy context |
explore_rpg | Traverse dependency graph (upstream, downstream, or both) |
context_pack | Single-call search + fetch + explore with token budget |
| Tool | Description |
|---|---|
impact_radius | BFS reachability analysis — "what depends on X?" |
plan_change | Change planning — find relevant entities, modification order, blast radius |
find_paths | K-shortest dependency paths between two entities |
slice_between | Extract minimal connecting subgraph between entities |
analyze_health | Code health: coupling, instability, god objects, clone detection |
detect_cycles | Find circular dependencies and architectural cycles |
reconstruct_plan | Dependency-safe reconstruction execution plan |
| Tool | Description |
|---|---|
auto_lift | One-call autonomous lifting via cheap LLM API (Haiku, GPT-4o-mini, OpenRouter, Gemini) |
lifting_status | Dashboard — coverage, per-area progress, NEXT STEP |
get_entities_for_lifting | Get entity source code for your agent to analyze |
submit_lift_results | Submit the agent's semantic features back to the graph |
finalize_lifting | Aggregate file-level features, rebuild hierarchy metadata |
get_files_for_synthesis | Get file-level entity features for holistic synthesis |
submit_file_syntheses | Submit holistic file-level summaries |
build_semantic_hierarchy | Get domain discovery + hierarchy assignment prompts |
submit_hierarchy | Apply hierarchy assignments to the graph |
get_routing_candidates | Get entities needing semantic routing (drifted or newly lifted) |
submit_routing_decisions | Submit routing decisions (hierarchy path or "keep") |
Supported Languages
15 languages via Tree-sitter:
| Language | Entity Extraction | Dependency Resolution |
|---|---|---|
| Python | Functions, classes, methods | imports, calls, inheritance |
| Rust | Functions, structs, traits, impl methods | use, calls, trait impls |
| TypeScript | Functions, classes, methods, interfaces | imports, calls, inheritance |
| JavaScript | Functions, classes, methods | imports, calls, inheritance |
| Go | Functions, structs, methods, interfaces | imports, calls |
| Java | Classes, methods, interfaces | imports, calls, inheritance |
| C / C++ | Functions, classes, methods, structs | includes, calls, inheritance |
| C# | Classes, methods, interfaces | using, calls, inheritance |
| PHP | Functions, classes, methods | use, calls, inheritance |
| Ruby | Classes, methods, modules | require, calls, inheritance |
| Kotlin | Functions, classes, methods | imports, calls, inheritance |
| Swift | Functions, classes, structs, protocols | imports, calls, inheritance |
| Scala | Functions, classes, objects, traits | imports, calls, inheritance |
| Bash | Functions | source, calls |
Install
MCP server (recommended)
# Claude Code
claude mcp add rpg -- npx -y -p rpg-encoder rpg-mcp-server
# Cursor — add to ~/.cursor/mcp.json
{
"mcpServers": {
"rpg": {
"command": "npx",
"args": ["-y", "-p", "rpg-encoder", "rpg-mcp-server"]
}
}
}
The server auto-detects the project root from the current working directory — no path argument needed.
<details> <summary><strong>CLI</strong></summary>npm install -g rpg-encoder
# Build a graph
rpg-encoder build
# Query
rpg-encoder search "parse entities from source code"
rpg-encoder fetch "src/parser.rs:extract_entities"
rpg-encoder explore "src/parser.rs:extract_entities" --direction both --depth 2
rpg-encoder info
# Autonomous lifting via API
rpg-encoder lift --provider anthropic --dry-run # estimate cost
rpg-encoder lift --provider anthropic # lift with Haiku (~$0.02/100 entities)
# Incremental update
rpg-encoder update
# Pre-commit hook (auto-updates graph on commit)
rpg-encoder hook install
git clone https://github.com/userFRM/rpg-encoder.git
cd rpg-encoder && cargo build --release
Then point your MCP config at target/release/rpg-mcp-server.
Documentation
- How RPG Compares — honest comparison with GitNexus, Serena, Repomix, and others
- Paper Fidelity — algorithm-by-algorithm comparison with the research paper
- Use Cases — practical examples of what RPG enables
- CHANGELOG — release history
Inspirations & References
rpg-encoder is built on the theoretical framework from the RPG-Encoder research paper, with original extensions inspired by tools across the code intelligence landscape:
- RPG-Encoder paper (Luo et al., 2026, Microsoft Research) — semantic lifting model, 3-level hierarchy construction, incremental evolution algorithms, formal graph model
G = (V_H ∪ V_L, E_dep ∪ E_feature). - GitNexus — precomputed relational intelligence, blast radius analysis, Claude Code hooks. Showed that a code graph tool must be invisible to be essential.
- Serena — symbol-level precision via LSP. Demonstrated that real-time code awareness matters more than batch analysis.
- TOON — Token-Oriented Object Notation for LLM-optimized output.
This is an independent implementation. All code is original work under the MIT license. Not affiliated with or endorsed by Microsoft.
License
常见问题
io.github.userFRM/rpg-encoder 是什么?
基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。
相关 Skills
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
相关 MCP Server
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。