io.github.hidai25/evalview-mcp

编码与调试

by hidai25

面向 AI agents 的回归测试工具,支持 golden baselines、CI/CD,并兼容 LangGraph、CrewAI、OpenAI 与 Claude。

什么是 io.github.hidai25/evalview-mcp

面向 AI agents 的回归测试工具,支持 golden baselines、CI/CD,并兼容 LangGraph、CrewAI、OpenAI 与 Claude。

README

<!-- mcp-name: io.github.hidai25/evalview-mcp --> <!-- keywords: AI agent testing, regression detection, golden baselines --> <p align="center"> <img src="assets/logo.png" alt="EvalView" width="350"> <br> <strong>The open-source behavior regression gate for AI agents.</strong><br> Think Playwright, but for tool-calling and multi-turn AI agents. </p> <p align="center"> <a href="https://pypi.org/project/evalview/"><img src="https://img.shields.io/pypi/v/evalview.svg?label=release" alt="PyPI version"></a> <a href="https://pypi.org/project/evalview/"><img src="https://img.shields.io/pypi/dm/evalview.svg?label=downloads" alt="PyPI downloads"></a> <a href="https://github.com/hidai25/eval-view/stargazers"><img src="https://img.shields.io/github/stars/hidai25/eval-view?style=social" alt="GitHub stars"></a> <a href="https://github.com/hidai25/eval-view/actions/workflows/ci.yml"><img src="https://github.com/hidai25/eval-view/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License"></a> <a href="https://github.com/hidai25/eval-view/graphs/contributors"><img src="https://img.shields.io/github/contributors/hidai25/eval-view" alt="Contributors"></a> </p>

Your agent can still return 200 and be wrong. A model or provider update can change tool choice, skip a clarification, or degrade output quality without changing your code or breaking a health check. EvalView catches those silent regressions before users do.

You don't need frontier-lab resources to run a serious agent regression loop. EvalView gives solo devs, startups, and small AI teams the same core discipline: snapshot behavior, detect drift, classify changes, and review or heal them safely.

Traditional tests tell you if your agent is up. EvalView tells you if it still behaves correctly. It tracks drift across outputs, tools, model IDs, and runtime fingerprints, so you can tell "the provider changed" from "my system regressed."

demo.gif

30-second live demo.

Most eval tools stop at detect and compare. EvalView helps you classify changes, inspect drift, and auto-heal the safe cases.

  • Catch silent regressions that normal tests miss
  • Separate provider/model drift from real system regressions
  • Auto-heal flaky failures with retries, review gates, and audit logs

Built for frontier-lab rigor, startup-team practicality:

  • targeted behavior runs instead of giant always-on eval suites
  • deterministic diffs first, LLM judgment where it adds signal
  • faster loops from change -> eval -> review -> ship

How we run EvalView with this operating model →

code
  ✓ login-flow           PASSED
  ⚠ refund-request       TOOLS_CHANGED
      - lookup_order → check_policy → process_refund
      + lookup_order → check_policy → process_refund → escalate_to_human
  ✗ billing-dispute      REGRESSION  -30 pts
      Score: 85 → 55  Output similarity: 35%

Quick Start

bash
pip install evalview
bash
evalview init        # Detect agent, auto-configure profile + starter suite
evalview snapshot    # Save current behavior as baseline
evalview check       # Catch regressions after every change

That's it. Three commands to regression-test any AI agent. init auto-detects your agent type (chat, tool-use, multi-step, RAG, coding) and configures the right evaluators, thresholds, and assertions.

<details> <summary><strong>Other install methods</strong></summary>
bash
curl -fsSL https://raw.githubusercontent.com/hidai25/eval-view/main/install.sh | bash
</details> <details> <summary><strong>No agent yet? Try the demo</strong></summary>
bash
evalview demo        # See regression detection live (~30 seconds, no API key)

Or clone a real working agent with built-in tests:

bash
git clone https://github.com/hidai25/evalview-support-automation-template
cd evalview-support-automation-template
make run
</details> <details> <summary><strong>More entry paths</strong></summary>
bash
evalview generate --agent http://localhost:8000           # Generate tests from a live agent
evalview capture --agent http://localhost:8000/invoke      # Capture real user flows (runs assertion wizard after)
evalview capture --agent http://localhost:8000/invoke --multi-turn  # Multi-turn conversation as one test
evalview generate --from-log traffic.jsonl                # Generate from existing logs
evalview init --profile rag                               # Override auto-detected agent profile
</details>

Why EvalView?

Use LangSmith for observability. Use Braintrust for scoring. Use EvalView for regression gating.

LangSmithBraintrustPromptfooEvalView
Primary focusObservabilityScoringPrompt comparisonRegression detection
Tool call + parameter diffingYes
Golden baseline regressionManualAutomatic
Silent model change detectionYes
Auto-heal (retry + variant proposal)Yes
PR comments with alertsCost, latency, model change
Works without API keysNoNoPartialYes
Production monitoringTracingCheck loop + Slack

Detailed comparisons →

What It Catches

StatusMeaningAction
PASSEDBehavior matches baselineShip with confidence
⚠️ TOOLS_CHANGEDDifferent tools calledReview the diff
⚠️ OUTPUT_CHANGEDSame tools, output shiftedReview the diff
REGRESSIONScore dropped significantlyFix before shipping

Model / Runtime Change Detection

EvalView does more than compare model_id.

  • Declared model change: adapter-reported model changed from baseline
  • Runtime fingerprint change: observed model labels in the trace changed, even when the top-level model name is missing
  • Coordinated drift: multiple tests shift together in the same check run, which often points to a silent provider rollout or runtime change

When detected, evalview check surfaces a run-level signal with a classification (declared or suspected), confidence level, and evidence from fingerprints, retries, and affected tests.

If the new behavior is correct, rerun evalview snapshot to accept the updated baseline.

Four scoring layers — the first two are free and offline:

LayerWhat it checksCost
Tool calls + sequenceExact tool names, order, parametersFree
Code-based checksRegex, JSON schema, contains/not_containsFree
Semantic similarityOutput meaning via embeddings~$0.00004/test
LLM-as-judgeOutput quality scored by LLM (GPT, Claude, Gemini, DeepSeek, Ollama)~$0.01/test
code
Score Breakdown
  Tools 100% ×30%    Output 42/100 ×50%    Sequence ✓ ×20%    = 54/100
  ↑ tools were fine   ↑ this is the problem

CI/CD Integration

Block broken agents in every PR. One step — PR comments, artifacts, and job summary are automatic.

yaml
# .github/workflows/evalview.yml — copy this, add your secret, done
name: EvalView Agent Check
on: [pull_request, push]

jobs:
  agent-check:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - name: Check for agent regressions
        uses: hidai25/eval-view@main
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
<details> <summary><strong>What lands on your PR</strong></summary>
code
## ✅ EvalView: PASSED

| Metric | Value |
|--------|-------|
| Tests | 5/5 unchanged (100%) |

---
*Generated by EvalView*

When something breaks:

code
## ❌ EvalView: REGRESSION

> **Alerts**
> - 💸 Cost spike: $0.02 → $0.08 (+300%)
> - 🤖 Model changed: gpt-5.4 → gpt-5.4-mini

| Metric | Value |
|--------|-------|
| Tests | 3/5 unchanged (60%) |
| Regressions | 1 |
| Tools Changed | 1 |

### Changes from Baseline
- ❌ **search-flow**: score -15.0, 1 tool change(s)
- ⚠️ **create-flow**: 1 tool change(s)
</details>

Common options: strict: 'true' | fail-on: 'REGRESSION,TOOLS_CHANGED' | mode: 'run' | filter: 'my-test'

Also works with pre-push hooks (evalview install-hooks) and status badges (evalview badge).

Full CI/CD guide →

Watch Mode

Leave it running while you code. Every file save triggers a regression check.

bash
evalview watch                          # Watch current dir, check on change
evalview watch --quick                  # No LLM judge — $0, sub-second
evalview watch --test "refund-flow"     # Only check one test
code
╭─────────────────────────── EvalView Watch ────────────────────────────╮
│   Watching   .                                                        │
│   Tests      all in tests/                                            │
│   Mode       quick (no judge, $0)                                     │
╰───────────────────────────────────────────────────────────────────────╯

14:32:07  Change detected: src/agent.py

╭──────────────────────────── Scorecard ────────────────────────────────╮
│ ████████████████████░░░░  4 passed · 1 tools changed · 0 regressions │
╰───────────────────────────────────────────────────────────────────────╯
  ⚠ TOOLS_CHANGED  refund-flow  1 tool change(s)

Watching for changes...

Multi-Turn Testing

Most eval tools handle single-turn well. EvalView is built for multi-turn — clarification paths, follow-up handling, and tool use across conversations.

yaml
name: refund-needs-order-number
turns:
  - query: "I want a refund"
    expected:
      output:
        contains: ["order number"]
  - query: "Order 4812"
    expected:
      tools: ["lookup_order", "check_policy"]
      forbidden_tools: ["delete_order"]
      output:
        contains: ["refund", "processed"]
        not_contains: ["error"]
thresholds:
  min_score: 70

Each turn scored independently with conversation context. Per-turn judge scoring, not just final response.

Smart DX

EvalView doesn't just run tests — it understands your agent and configures itself.

Assertion Wizard — Tests From Real Traffic

Capture real interactions, get pre-configured tests. No YAML writing.

bash
evalview capture --agent http://localhost:8000/invoke
# Use your agent normally, then Ctrl+C
code
Assertion Wizard — analyzing 8 captured interactions

  Agent type detected: multi-step
  Tools seen          search, extract, summarize
  Consistent sequence search -> extract -> summarize

  Suggested assertions:
    1. Lock tool sequence: search -> extract -> summarize  (recommended)
    2. Require tools: search, extract, summarize           (recommended)
    3. Max latency: 5000ms                                 (recommended)
    4. Minimum quality score: 70                           (recommended)

  Accept all recommended? [Y/n]: y
  Applied 4 assertions to 8 test files

Auto-Variant Discovery — Solve Non-Determinism

Non-deterministic agents take different valid paths. Let EvalView discover and save them:

bash
evalview check --statistical 10 --auto-variant
code
  search-flow  mean: 82.3, std: 8.1, flakiness: low_variance
    1. search -> extract -> summarize  (7/10 runs, avg score: 85.2)
    2. search -> summarize             (3/10 runs, avg score: 78.1)

    Save as golden variant? [Y/n]: y
    Saved variant 'auto-v1': search -> summarize

Run N times. Cluster the paths. Save the valid ones. Tests stop being flaky — automatically.

Auto-Heal — Fix Flakes Without Leaving CI

Model got silently updated? Output drifted? --heal retries safe failures, proposes variants for borderline cases, and hard-escalates everything else. It also records when those retries were triggered by a likely model/runtime update.

bash
evalview check --heal
code
  ⚠ Model update detected: gpt-5-2025-08-07 → gpt-5.1-2025-11-12 (3 tests affected)

  ✓ login-flow           PASSED
  ⚡ refund-request       HEALED   retried — non-deterministic drift
  ⚡ order-lookup         HEALED   retried — likely model/runtime update
  ◈ billing-dispute      PROPOSED saved candidate variant auto_heal_a1b2 (score 72)
  ⚠ search-flow          REVIEW   tool removed: web_search
  ✗ safety-check         BLOCKED  forbidden tool called — cannot heal

  3 resolved, 1 candidate variant saved, 1 needs review, 1 blocked.
  Model update: 2 of 3 affected tests healed via retry. Run `evalview snapshot` to rebase.
  Audit log: .evalview/healing/2026-03-25T14-30-00.json

Decision policy: Retry when tools match but output drifted (non-determinism or likely model/runtime update). Propose a variant when retry fails but score is acceptable. Never auto-resolve structural changes, forbidden tool violations, cost spikes, or score improvements. Full audit trail in .evalview/healing/.

Exit code: 0 only when every failure was resolved via retry. Proposed variants, reviews, and blocks always exit 1 — CI stays honest.

<details> <summary><strong>Budget circuit breaker + Smart eval profiles</strong></summary>

Budget circuit breaker — enforced mid-execution, not post-hoc:

bash
evalview check --budget 0.50
code
  $0.12 (24%) — search-flow
  $0.09 (18%) — refund-flow
  $0.31 (62%) — billing-dispute

  Budget circuit breaker tripped: $0.52 spent of $0.50 limit
  2 test(s) skipped to stay within budget

Smart eval profilesevalview init detects your agent type and pre-configures evaluators:

Five profiles — chat, tool-use, multi-step, rag, coding — each with tailored thresholds, recommended checks, and actionable tips. Override with --profile rag.

</details>

Supported Frameworks

Works with LangGraph, CrewAI, OpenAI, Claude, Mistral, HuggingFace, Ollama, MCP, and any HTTP API.

AgentE2E TestingTrace Capture
LangGraph
CrewAI
OpenAI Assistants
Claude Code
OpenClaw
Ollama
Any HTTP API

Framework details → | Flagship starter → | Starter examples →

How It Works

code
┌────────────┐      ┌──────────┐      ┌──────────────┐
│ Test Cases  │ ──→  │ EvalView │ ──→  │  Your Agent   │
│   (YAML)   │      │          │ ←──  │ local / cloud │
└────────────┘      └──────────┘      └──────────────┘
  1. evalview init — detects your running agent, creates a starter test suite
  2. evalview snapshot — runs tests, saves traces as baselines
  3. evalview check — replays tests, diffs against baselines, opens HTML report
  4. evalview watch — re-runs checks on every file save
  5. evalview monitor — continuous checks in production with Slack alerts
<details> <summary><strong>Snapshot management</strong></summary>
bash
evalview snapshot list              # See all saved baselines
evalview snapshot show "my-test"    # Inspect a baseline
evalview snapshot delete "my-test"  # Remove a baseline
evalview snapshot --preview         # See what would change without saving
evalview snapshot --reset           # Clear all and start fresh
evalview replay                     # List tests, or: evalview replay "my-test"
</details>

Your data stays local by default. Nothing leaves your machine unless you opt in to cloud sync via evalview login.

Production Monitoring

bash
evalview monitor                                         # Check every 5 min
evalview monitor --dashboard                             # Live terminal dashboard
evalview monitor --slack-webhook https://hooks.slack.com/services/...
evalview monitor --history monitor.jsonl                 # JSONL for dashboards

New regressions trigger Slack alerts. Recoveries send all-clear. No spam on persistent failures.

Monitor config options →

Key Features

FeatureDescriptionDocs
Assertion wizardAnalyze captured traffic, suggest smart assertions automaticallyAbove
Auto-variant discoveryRun N times, cluster paths, save valid variantsAbove
Auto-healRetry flakes, propose variants, escalate structural changesAbove
Budget circuit breakerMid-execution budget enforcement with per-test cost breakdownAbove
Smart eval profilesAuto-detect agent type, pre-configure evaluatorsAbove
Baseline diffingTool call + parameter + output regression detectionDocs
Multi-turn testingPer-turn tool, forbidden_tools, and output checksDocs
Multi-reference baselinesUp to 5 variants for non-deterministic agentsDocs
forbidden_toolsSafety contracts — hard-fail on any violationDocs
Watch modeevalview watch — re-run checks on file save, with dashboardDocs
Model comparisonrun_eval / compare_models — test one query across N models in parallelDocs
Python APIgate() / gate_async() — programmatic regression checksDocs
PR comments + alertsCost/latency spikes, model changes, collapsible diffsDocs
Terminal dashboardScorecard, sparkline trends, confidence scoring
<details> <summary><strong>All features</strong></summary>
FeatureDescriptionDocs
Multi-turn capturecapture --multi-turn records conversations as testsDocs
Semantic similarityEmbedding-based output comparisonDocs
Production monitoringevalview monitor --dashboard with Slack alerts and JSONL historyDocs
A/B comparisonevalview compare --v1 <url> --v2 <url>Docs
Test generationevalview generate — discovers your agent's domain, generates relevant testsDocs
Per-turn judge scoringMulti-turn output quality scored per turn with conversation contextDocs
Silent model detectionAlerts when LLM provider updates the model versionDocs
Gradual drift detectionTrend analysis across check historyDocs
Statistical mode (pass@k)Run N times, require a pass rate, auto-discover variantsDocs
HTML trace replayAuto-opens after check with full trace detailsDocs
Verified cost trackingPer-test cost breakdown with model pricing ratesDocs
Judge model pickerChoose GPT, Claude, Gemini, DeepSeek, or Ollama (free)Docs
Pytest pluginevalview_check fixture for standard pytestDocs
Model comparisonrun_eval / compare_models — parametrize tests across models, auto-detect providerDocs
GitHub Actions job summaryResults visible in Actions UI, not just PR commentsDocs
Git hooksPre-push regression blocking, zero CI configDocs
LLM judge caching~80% cost reduction in statistical modeDocs
Quick modegate(quick=True) — no judge, $0, sub-secondDocs
OpenClaw integrationRegression gate skill + gate_or_revert() helpersDocs
Snapshot previewevalview snapshot --preview — dry-run before saving
Skills testingE2E testing for Claude Code, Codex, OpenClawDocs
</details>

Python API

Use EvalView as a library — no CLI, no subprocess, no output parsing.

python
from evalview import gate, DiffStatus

result = gate(test_dir="tests/")

result.passed          # bool — True if no regressions
result.status          # DiffStatus.PASSED / REGRESSION / TOOLS_CHANGED
result.summary         # .total, .unchanged, .regressions, .tools_changed
result.diffs           # List[TestDiff] — per-test scores and tool diffs
<details> <summary><strong>Quick mode, async, and autonomous loops</strong></summary>

Quick mode — skip the LLM judge for free, sub-second checks:

python
result = gate(test_dir="tests/", quick=True)  # deterministic only, $0

Async — for agent frameworks already in an event loop:

python
result = await gate_async(test_dir="tests/")

Autonomous loops — gate + auto-revert on regression:

python
from evalview.openclaw import gate_or_revert

make_code_change()
if not gate_or_revert("tests/", quick=True):
    # Change was reverted — try a different approach
    try_alternative()
</details>

OpenClaw Integration

Use EvalView as a regression gate in autonomous agent loops.

bash
evalview openclaw install                    # Install gate skill into workspace
evalview openclaw check --path tests/        # Check and auto-revert on regression
<details> <summary><strong>Python API for autonomous loops</strong></summary>
python
from evalview.openclaw import gate_or_revert

make_code_change()
if not gate_or_revert("tests/", quick=True):
    try_alternative()  # Change was reverted
</details>

Pytest Plugin

python
def test_weather_regression(evalview_check):
    diff = evalview_check("weather-lookup")
    assert diff.overall_severity.value != "regression", diff.summary()
bash
pip install evalview    # Plugin registers automatically
pytest                  # Runs alongside your existing tests

Model Comparison

Test the same task across multiple models with one parametrized test. No config files — just a model name and a query.

python
import pytest
import evalview

@pytest.mark.parametrize("model", ["claude-opus-4-6", "gpt-4o", "claude-sonnet-4-6"])
def test_my_task(model):
    result = evalview.run_eval(model, query="Summarize this contract in one sentence.")
    assert evalview.score(result) > 0.8

Provider is auto-detected from the model name. Requires ANTHROPIC_API_KEY / OPENAI_API_KEY depending on which models you use.

Score against expected output — token-overlap similarity, no LLM judge needed:

python
result = evalview.run_eval(
    "gpt-4o",
    query="What language is Python?",
    expected="Python is a high-level interpreted language.",
    threshold=0.4,
)

Custom scorer — assert specific behavior:

python
def has_json(output, expected):
    import json, re
    m = re.search(r"\{.*?\}", output, re.DOTALL)
    try: return 1.0 if json.loads(m.group()) else 0.0
    except: return 0.0

result = evalview.run_eval("claude-opus-4-6", query="Return JSON: {name, age}", scorer=has_json)
assert evalview.score(result) == 1.0

Run all models in parallel and compare:

python
results = evalview.compare_models(
    query="Explain quantum entanglement in one sentence.",
    models=["claude-opus-4-6", "gpt-4o", "claude-sonnet-4-6"],
)
evalview.print_comparison_table(results)   # Rich table: score, latency, cost
best = results[0]                          # sorted best-first
code
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ Model              ┃ Score ┃  Latency ┃      Cost ┃ Pass? ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ claude-opus-4-6    │  1.00 │    842ms │ $0.00312  │   ✓   │
│ gpt-4o             │  1.00 │    631ms │ $0.00087  │   ✓   │
│ claude-sonnet-4-6  │  1.00 │    514ms │ $0.00063  │   ✓   │
└────────────────────┴───────┴──────────┴───────────┴───────┘

ModelResult fields: .model, .output, .score, .latency_ms, .cost_usd, .passed, .error

Full example →

Claude Code (MCP)

bash
claude mcp add --transport stdio evalview -- evalview mcp serve

8 tools: create_test, run_snapshot, run_check, list_tests, validate_skill, generate_skill_tests, run_skill_test, generate_visual_report

<details> <summary><strong>MCP setup details</strong></summary>
bash
# 1. Install
pip install evalview

# 2. Connect to Claude Code
claude mcp add --transport stdio evalview -- evalview mcp serve

# 3. Make Claude Code proactive
cp CLAUDE.md.example CLAUDE.md

Then just ask Claude: "did my refactor break anything?" and it runs run_check inline.

</details>

Agent-Friendly Docs

Works with your coding agent out of the box. Ask Cursor, Claude Code, or Copilot to add regression tests, build a new adapter, or debug a failing check — EvalView ships the architecture maps and task recipes they need to get it right on the first try.

Documentation

Getting StartedCore FeaturesIntegrations
Getting StartedGolden TracesCI/CD
CLI ReferenceEvaluation MetricsMCP Contracts
Agent InstructionsAgent RecipesOllama Recipe
FAQTest GenerationSkills Testing
YAML SchemaStatistical ModeChat Mode
Framework SupportBehavior CoverageDebugging

Contributing

License: Apache 2.0


Star History

Star History Chart

常见问题

io.github.hidai25/evalview-mcp 是什么?

面向 AI agents 的回归测试工具,支持 golden baselines、CI/CD,并兼容 LangGraph、CrewAI、OpenAI 与 Claude。

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描111.1k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描111.1k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描111.1k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
83.0k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
51.7k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.2k

评论