io.github.joesaby/doctree-mcp

编码与调试

by joesaby

BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.

什么是 io.github.joesaby/doctree-mcp

BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.

README

doctree-mcp

Agentic document retrieval over markdown — BM25 search + tree navigation via MCP.

Give an AI agent structured access to your markdown docs: it searches with BM25, reads the outline, reasons about which sections matter, and retrieves only what it needs. No vector DB, no embeddings, no LLM calls at index time.

Why

Standard RAG gives agents a bag of loosely relevant paragraphs. This gives them a table of contents they can reason over, plus a search engine that actually ranks by relevance.

code
search_documents("auth token refresh")     → find candidate docs (BM25 ranked)
get_tree("docs:auth:middleware")           → see the heading hierarchy
  [n4] ## Token Refresh Flow (180 words)
    [n5] ### Automatic Refresh (90 words)
    [n6] ### Manual Refresh API (150 words)
    [n7] ### Error Handling (200 words)
navigate_tree("docs:auth:middleware", "docs:auth:middleware:n4") → get exactly n4+n5+n6+n7

Context budget: 2K-8K tokens with precise content, vs 4K-20K tokens of noisy chunks from vector RAG.

Quick Start

bash
# Install Bun if you don't have it
curl -fsSL https://bun.com/install | bash

# Run directly — no clone needed
DOCS_ROOT=/path/to/your/markdown/docs bunx doctree-mcp

Claude Desktop Configuration

json
{
  "mcpServers": {
    "doctree": {
      "command": "bunx",
      "args": ["doctree-mcp"],
      "env": {
        "DOCS_ROOT": "/path/to/your/markdown/docs"
      }
    }
  }
}

Run from source

bash
git clone https://github.com/joesaby/doctree-mcp.git
cd doctree-mcp
bun install
DOCS_ROOT=./docs bun run serve        # stdio
DOCS_ROOT=./docs bun run serve:http   # HTTP (port 3100)

MCP Tools

ToolDescription
list_documentsBrowse catalog with tag/keyword filtering and facet counts
search_documentsBM25 keyword search with facet filters and glossary expansion
get_treeHierarchical outline for agent reasoning — structure and word counts, no content
get_node_contentRetrieve full text of specific sections by node ID
navigate_treeGet a section and all descendants in one call

Configuration

bash
# .env
DOCS_ROOT=./docs    # path to your markdown repository
DOCS_GLOB=**/*.md   # file glob pattern

See docs/CONFIGURATION.md for multiple collections, ranking tuning, frontmatter best practices, and glossary setup.

Performance

OperationLatencyToken cost
Full index (900 docs)2-5s0 LLM tokens
Incremental re-index (5 changed)~50ms0 LLM tokens
Search5-30ms~300-1K tokens
Search with facet filters2-15ms~200-800 tokens
Tree outline<1ms~200-800 tokens

Memory: ~25-50MB for 900 docs with full positional index and facets.

Docs

Standing on Shoulders

  • PageIndex — Hierarchical tree navigation and the agent reasoning workflow
  • Pagefind by CloudCannon — BM25 scoring, positional index, filter facets, density excerpts, stemming, and more. Full attribution in DESIGN.md.
  • Bun.markdown by Oven — Native CommonMark parser enabling zero-cost tree construction from raw markdown

License

MIT

常见问题

io.github.joesaby/doctree-mcp 是什么?

BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.

相关 Skills

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描114.1k

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描114.1k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描114.1k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
83.4k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
52.2k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.3k

评论