io.github.userFRM/rpg-encoder
编码与调试by userfrm
基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。
什么是 io.github.userFRM/rpg-encoder?
基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。
README
rpg-encoder
[!NOTE] Independent implementation. Built from scratch in Rust by reading the RPG-Encoder paper (Luo et al., 2026, Microsoft Research). No shared code with any other implementation. Not affiliated with or endorsed by Microsoft. All code is original work; the paper is cited for attribution.
Coding agent toolkit for semantic code understanding.
rpg-encoder builds a semantic graph of your codebase. Your coding agent (Claude Code, Cursor, etc.) analyzes the code and adds intent-level features via the MCP interactive protocol. Search by what code does, not what it's named.
[!TIP] New to RPG? See How RPG Compares to understand where it fits alongside Claude Code, Serena, and other tools. For a detailed algorithm-by-algorithm comparison with the research paper, see Paper Fidelity.
Install
The MCP server automatically detects the project root — no path argument needed.
Add to your MCP config (Claude Code ~/.claude.json, Cursor settings, opencode, etc.):
{
"mcpServers": {
"rpg": {
"command": "npx",
"args": ["-y", "-p", "rpg-encoder", "rpg-mcp-server"]
}
}
}
<details> <summary>Alternative: build from source</summary>[!TIP] The path argument is optional. When omitted, the server falls back to the current working directory. MCP clients (like opencode, Claude Code, Cursor) launch the server from the workspace directory, so
current_dir()automatically points to your project. If you pass a path explicitly, it will use that instead.
git clone https://github.com/userFRM/rpg-encoder.git
cd rpg-encoder && cargo build --release
Then use the binary path directly:
{
"mcpServers": {
"rpg": {
"command": "/path/to/rpg-encoder/target/release/rpg-mcp-server"
}
}
}
</details> <details> <summary><strong>Multi-repo setup</strong></summary>[!TIP] The binary also accepts an optional path argument. Omit it to use the current working directory.
The path argument is optional — the server defaults to the current working directory. This works automatically because MCP clients launch the server from the workspace directory.
Global config (all repos use cwd)
No path needed — each session uses the directory where the MCP client was started:
{
"mcpServers": {
"rpg": {
"command": "npx",
"args": ["-y", "-p", "rpg-encoder", "rpg-mcp-server"]
}
}
}
Per-project override (explicit path)
If you need a specific repo, pass the path:
{
"mcpServers": {
"rpg": {
"command": "npx",
"args": ["-y", "-p", "rpg-encoder", "rpg-mcp-server", "/path/to/this/repo"]
}
}
}
The project-level config overrides the global one. Restart Claude Code after creating/modifying configs.
</details>Lifecycle
graph LR
A[Install] --> B[Build]
B --> C[Lift]
C --> D[Use]
D --> E[Update]
E --> C
You install it. Your agent does the rest.
Getting Started
Tell your coding agent:
"Build and lift the RPG for this repo"
That's it. The agent handles everything. Here's what happens:
- Build — Indexes all code entities and dependencies (~5 seconds)
- Lift — Agent analyzes each function/class and adds semantic features (~2 min per 100 entities)
- Organize — Agent discovers functional domains and builds a semantic hierarchy (~30 seconds)
- Save — Graph is written to
.rpg/graph.json— commit it so everyone benefits
Once lifted, try queries like:
- "What handles authentication?"
- "Show me everything that depends on the database connection"
- "Plan a change to add rate limiting to API endpoints"
The RPG (Repository Planning Graph) is a hierarchical, dual-view representation from the research papers cited below:
- Parse — Extract entities (functions, classes, methods) and dependency edges (imports, invocations, inheritance) using tree-sitter. Build a file-path hierarchy.
- Lift — Your coding agent analyzes entity source code and adds verb-object semantic
features (e.g., "validate user credentials", "serialize config to disk") via the MCP
interactive protocol (
get_entities_for_lifting→submit_lift_results). - Hierarchy — Your agent discovers functional domains and assigns entities to a 3-level
semantic hierarchy (
build_semantic_hierarchy→submit_hierarchy). - Ground — Anchor hierarchy nodes to directories via LCA algorithm, resolve cross-file dependency edges.
The graph is saved to .rpg/graph.json and should be committed to your repo — this way
all collaborators and AI tools get instant semantic search without rebuilding.
MCP Tools
Build & Maintain
| Tool | Description |
|---|---|
build_rpg | Index the codebase (run once, instant) |
update_rpg | Incremental update from git changes |
reload_rpg | Reload graph from disk after external changes |
rpg_info | Graph statistics, hierarchy overview, per-area lifting coverage |
Semantic Lifting
| Tool | Description |
|---|---|
lifting_status | Dashboard — coverage, per-area progress, NEXT STEP |
get_entities_for_lifting | Get entity source code for your agent to analyze |
submit_lift_results | Submit the agent's semantic features back to the graph |
finalize_lifting | Aggregate file-level features, rebuild hierarchy metadata |
get_files_for_synthesis | Get file-level entity features for holistic synthesis |
submit_file_syntheses | Submit holistic file-level summaries |
build_semantic_hierarchy | Get domain discovery + hierarchy assignment prompts |
submit_hierarchy | Apply hierarchy assignments to the graph |
get_routing_candidates | Get entities needing semantic routing (drifted or newly lifted) |
submit_routing_decisions | Submit routing decisions (hierarchy path or "keep") |
Navigate & Search
| Tool | Description |
|---|---|
search_node | Search entities by intent or keywords (hybrid embedding + lexical scoring) |
fetch_node | Get entity metadata, source code, dependencies, and hierarchy context |
explore_rpg | Traverse dependency graph (upstream, downstream, or both) |
context_pack | Single-call search+fetch+explore with token budget |
Plan & Analyze
| Tool | Description |
|---|---|
impact_radius | BFS reachability analysis — "what depends on X?" |
plan_change | Change planning — find relevant entities, modification order, blast radius |
find_paths | K-shortest dependency paths between two entities |
slice_between | Extract minimal connecting subgraph between entities |
reconstruct_plan | Dependency-safe reconstruction execution plan |
Lifting: What It Is
Lifting is the process where your coding agent reads each function, class, and method in your codebase and describes what it does in plain English — verb-object features like "validate user credentials" or "serialize config to disk". These features power semantic search: find code by what it does, not what it's named.
- No API keys needed — your connected coding agent (Claude Code, Cursor, etc.) is the LLM
- One-time cost — lift once, commit
.rpg/, and every future session starts instantly - Resumable — if interrupted,
lifting_statuspicks up exactly where you left off - Incremental — after code changes,
update_rpgdetects what moved and only re-lifts those entities - Scoped — lift the whole repo or just a subdirectory (
"src/auth/**")
- Ask your agent to "lift the code" (or call
get_entities_for_liftingwith a scope) - The tool returns entity source code with analysis instructions
- Your agent analyzes the code and calls
submit_lift_resultswith semantic features - The agent continues through all batches automatically, dispatching subagents for large repos
- After lifting,
finalize_lifting→build_semantic_hierarchy→submit_hierarchy
Supported Languages
| Language | Entity Extraction | Dependency Resolution |
|---|---|---|
| Python | Functions, classes, methods | imports, calls, inheritance |
| Rust | Functions, structs, traits, impl methods | use statements, calls, trait impls |
| TypeScript | Functions, classes, methods, interfaces | imports, calls, inheritance |
| JavaScript | Functions, classes, methods | imports, calls, inheritance |
| Go | Functions, structs, methods, interfaces | imports, calls |
| Java | Classes, methods, interfaces | imports, calls, inheritance |
| C | Functions, structs | includes, calls |
| C++ | Functions, classes, methods, structs | includes, calls, inheritance |
| C# | Classes, methods, interfaces | using, calls, inheritance |
| PHP | Functions, classes, methods | use, calls, inheritance |
| Ruby | Classes, methods, modules | require, calls, inheritance |
| Kotlin | Functions, classes, methods | imports, calls, inheritance |
| Swift | Functions, classes, structs, protocols | imports, calls, inheritance |
| Scala | Functions, classes, objects, traits | imports, calls, inheritance |
| Bash | Functions | source, calls |
The CLI provides structural operations (no semantic lifting — use the MCP server for that).
# Install
npm install -g rpg-encoder
# Build a graph
rpg-encoder build
rpg-encoder build --include "src/**/*.py" --exclude "tests/**"
# Query
rpg-encoder search "parse entities from source code"
rpg-encoder fetch "src/parser.rs:extract_entities"
rpg-encoder explore "src/parser.rs:extract_entities" --direction both --depth 2
rpg-encoder info
# Incremental update
rpg-encoder update
rpg-encoder update --since abc1234
# Paper-style reconstruction schedule (topological + coherent batches)
rpg-encoder reconstruct-plan --max-batch-size 8 --format text
rpg-encoder reconstruct-plan --format json
# Pre-commit hook (auto-updates graph on every commit)
rpg-encoder hook install
Create .rpg/config.toml in your project root (all fields optional):
[encoding]
batch_size = 50 # Entities per lifting batch
max_batch_tokens = 8000 # Token budget per batch
drift_threshold = 0.5 # Jaccard distance midpoint reference
drift_ignore_threshold = 0.3 # Below: minor edit, in-place update
drift_auto_threshold = 0.7 # Above: auto-queue for re-routing
[navigation]
search_result_limit = 10
rpg-encoder/
├── rpg-core Core graph types (RPGraph, Entity, HierarchyNode), storage, LCA
├── rpg-parser Tree-sitter entity + dependency extraction (15 languages)
├── rpg-encoder Encoding pipeline, semantic lifting utilities, incremental evolution
│ └── prompts/ Prompt templates (embedded via include_str!)
├── rpg-nav Search, fetch, explore, TOON serialization
├── rpg-cli CLI binary (rpg-encoder)
└── rpg-mcp MCP server binary (rpg-mcp-server)
| Aspect | Paper (Microsoft) | This Repo |
|---|---|---|
| Implementation | Python (unreleased) | Rust (available now) |
| Lifting strategy | Full upfront via API | Progressive — your coding agent lifts via MCP |
| Semantic routing | LLM-based | LLM-based (via MCP routing protocol) |
| Feature search | Embedding-based | Hybrid embedding + lexical (BGE-small-en-v1.5) |
| MCP server | Described, not shipped | Working, with 23 tools |
| SWE-bench evaluation | 93.7% Acc@5 | Self-eval: MRR 0.59, Acc@10 85% (benchmark) |
| Languages | Python-focused | 15 languages |
| TOON format | Not described | Implemented for token efficiency |
Do I need an API key or a local LLM?
No. Your connected coding agent (Claude Code, Cursor, etc.) is the LLM. rpg-encoder sends source code to the agent via MCP tools, the agent analyzes it and sends back semantic features. No API keys, no external services, no local model downloads.
How long does lifting take?
Roughly 2 minutes per 100 entities. A small project (50 files, ~200 entities) takes about 5 minutes. A large project (500+ files) should use parallel subagents — your agent handles this automatically. Build and hierarchy steps are near-instant.
What happens when I delete or rename files?
Run update_rpg (or use the pre-commit hook). It diffs against the last indexed commit,
removes deleted entities, re-extracts renamed/modified files, and marks changed entities
for re-lifting. The graph stays consistent without a full rebuild.
Can I lift only part of the codebase?
Yes. Pass a file glob to get_entities_for_lifting: "src/auth/**", "crates/rpg-core/**",
etc. You can also use .rpgignore (gitignore syntax) to permanently exclude files like
vendored dependencies or generated code.
What if lifting gets interrupted?
The graph is saved to disk after every submit_lift_results call. Start a new session,
call lifting_status, and it picks up exactly where you left off — only unlifted entities
are returned.
How does semantic search work?
search_node uses hybrid scoring: BGE-small-en-v1.5 embeddings for semantic similarity
plus lexical matching for exact names and paths. Query with intent ("handle authentication")
or exact identifiers ("AuthService::validate") — both work.
Should I commit .rpg/ to the repo?
Yes. The .rpg/graph.json file contains the full semantic graph. Committing it means
collaborators and CI agents get instant semantic search without re-lifting. The graph
is deterministic (sorted maps, stable serialization), so diffs are meaningful.
What about monorepos or very large codebases?
Use scoped lifting to process one area at a time ("packages/api/**", "services/auth/**").
Your coding agent will automatically dispatch parallel subagents for large scopes. The
incremental update system (update_rpg) keeps the graph current without full rebuilds.
For very large repos, use .rpgignore to exclude vendored code, generated files, and
test fixtures.
References
This project is based on the following research papers. All credit for the theoretical framework, algorithms, and evaluation methodology belongs to the original authors.
-
RPG-Encoder: Luo, J., Yin, C., Zhang, X., et al. "Closing the Loop: Universal Repository Representation with RPG-Encoder." arXiv:2602.02084, 2026. [Paper] [Project Page] [Official Code]
-
RPG (ZeroRepo): Luo, J., Yin, C., et al. "RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph." arXiv:2509.16198, 2025. [Paper]
-
TOON: Token-Oriented Object Notation — an LLM-optimized data format used for MCP tool output and LLM response parsing. [Spec]
License
Licensed under the MIT License.
This is an independent implementation. The RPG-Encoder paper and its associated intellectual property belong to Microsoft Research and the paper's authors. This project implements the publicly described algorithms and does not contain any code from Microsoft.
常见问题
io.github.userFRM/rpg-encoder 是什么?
基于 tree-sitter 与 MCP 构建 semantic code graph,帮助 AI 辅助理解代码结构与语义关系。
相关 Skills
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
相关 MCP Server
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。