io.github.joesaby/doctree-mcp
编码与调试by joesaby
BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.
什么是 io.github.joesaby/doctree-mcp?
BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.
README
doctree-mcp
Agentic document retrieval over markdown — BM25 search + tree navigation via MCP.
Give an AI agent structured access to your markdown docs: it searches with BM25, reads the outline, reasons about which sections matter, and retrieves only what it needs. No vector DB, no embeddings, no LLM calls at index time.
Why
Standard RAG gives agents a bag of loosely relevant paragraphs. This gives them a table of contents they can reason over, plus a search engine that actually ranks by relevance.
search_documents("auth token refresh") → find candidate docs (BM25 ranked)
get_tree("docs:auth:middleware") → see the heading hierarchy
[n4] ## Token Refresh Flow (180 words)
[n5] ### Automatic Refresh (90 words)
[n6] ### Manual Refresh API (150 words)
[n7] ### Error Handling (200 words)
navigate_tree("docs:auth:middleware", "docs:auth:middleware:n4") → get exactly n4+n5+n6+n7
Context budget: 2K-8K tokens with precise content, vs 4K-20K tokens of noisy chunks from vector RAG.
Quick Start
# Install Bun if you don't have it
curl -fsSL https://bun.com/install | bash
# Run directly — no clone needed
DOCS_ROOT=/path/to/your/markdown/docs bunx doctree-mcp
Claude Desktop Configuration
{
"mcpServers": {
"doctree": {
"command": "bunx",
"args": ["doctree-mcp"],
"env": {
"DOCS_ROOT": "/path/to/your/markdown/docs"
}
}
}
}
Run from source
git clone https://github.com/joesaby/doctree-mcp.git
cd doctree-mcp
bun install
DOCS_ROOT=./docs bun run serve # stdio
DOCS_ROOT=./docs bun run serve:http # HTTP (port 3100)
MCP Tools
| Tool | Description |
|---|---|
list_documents | Browse catalog with tag/keyword filtering and facet counts |
search_documents | BM25 keyword search with facet filters and glossary expansion |
get_tree | Hierarchical outline for agent reasoning — structure and word counts, no content |
get_node_content | Retrieve full text of specific sections by node ID |
navigate_tree | Get a section and all descendants in one call |
Configuration
# .env
DOCS_ROOT=./docs # path to your markdown repository
DOCS_GLOB=**/*.md # file glob pattern
See docs/CONFIGURATION.md for multiple collections, ranking tuning, frontmatter best practices, and glossary setup.
Performance
| Operation | Latency | Token cost |
|---|---|---|
| Full index (900 docs) | 2-5s | 0 LLM tokens |
| Incremental re-index (5 changed) | ~50ms | 0 LLM tokens |
| Search | 5-30ms | ~300-1K tokens |
| Search with facet filters | 2-15ms | ~200-800 tokens |
| Tree outline | <1ms | ~200-800 tokens |
Memory: ~25-50MB for 900 docs with full positional index and facets.
Docs
- Architecture & Design — BM25, tree navigation, Pagefind/PageIndex attribution
- Configuration Reference — env vars, frontmatter, ranking tuning, glossary
- Competitive Analysis — comparison with PageIndex, QMD, GitMCP, Context7
Standing on Shoulders
- PageIndex — Hierarchical tree navigation and the agent reasoning workflow
- Pagefind by CloudCannon — BM25 scoring, positional index, filter facets, density excerpts, stemming, and more. Full attribution in DESIGN.md.
- Bun.markdown by Oven — Native CommonMark parser enabling zero-cost tree construction from raw markdown
License
MIT
常见问题
io.github.joesaby/doctree-mcp 是什么?
BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.
相关 Skills
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
相关 MCP Server
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。