io.github.TickTockBent/charlotte

编码与调试

by ticktockbent

使用无头 Chromium 将网页渲染为结构化、可供 agent 读取与处理的页面表示。

让agent不再只会抓HTML,charlotte借助无头Chromium还原真实网页,并输出结构化表示,做自动化分析和交互更稳。

什么是 io.github.TickTockBent/charlotte

使用无头 Chromium 将网页渲染为结构化、可供 agent 读取与处理的页面表示。

README

Charlotte

The Web, Readable.

Your AI agent spends 60,000 tokens just to look at a web page. Charlotte does it in 336.

Charlotte is an MCP server that gives AI agents structured, token-efficient access to the web. Instead of dumping the full accessibility tree on every call, Charlotte returns only what the agent needs: a compact page summary on arrival, targeted queries for specific elements, and full detail only when explicitly requested. The result is 25-182x less data per page compared to Playwright MCP, saving thousands of dollars across production workloads.

Why Charlotte?

Most browser MCP servers dump the entire accessibility tree on every call — a flat text blob that can exceed a million characters on content-heavy pages. Agents pay for all of it whether they need it or not.

Charlotte decomposes each page into a typed, structured representation — landmarks, headings, interactive elements, forms, content summaries — and lets agents control how much they receive with three detail levels. When an agent navigates to a new page, it gets a compact orientation (336 characters for Hacker News) instead of the full element dump (61,000+ characters). When it needs specifics, it asks for them.

Benchmarks

Charlotte v0.6.0 vs Playwright MCP, measured by characters returned per tool call on real websites:

Navigation (first contact with a page):

SiteCharlotte navigatePlaywright browser_navigate
example.com612817
Wikipedia (AI article)7,6671,040,636
Hacker News33661,230
GitHub repo3,18580,297

Charlotte's navigate returns minimal detail by default — landmarks, headings, and interactive element counts grouped by page region. Enough to orient, not enough to overwhelm. On Wikipedia, that's 135x smaller than Playwright's response.

Tool definition overhead (invisible cost per API call):

ProfileToolsDef. tokens/callSavings vs full
full43~7,600
browse (default)23~3,900~49%
core71,677~78%

Tool definitions are sent on every API round-trip. With the default browse profile, Charlotte carries ~49% less definition overhead than loading all tools. Over a 20-call browsing session, that's ~40% fewer total tokens. See the profile benchmark report for full results.

The workflow difference: Playwright agents receive 61K+ characters every time they look at Hacker News, whether they're reading headlines or looking for a login button. Charlotte agents get 336 characters on arrival, call find({ type: "link", text: "login" }) to get exactly what they need, and never pay for the rest.

How It Works

Charlotte maintains a persistent headless Chromium session and acts as a translation layer between the visual web and the agent's text-native reasoning. Every page is decomposed into a structured representation:

code
┌─────────────┐     MCP Protocol     ┌──────────────────┐
│   AI Agent  │<────────────────────>│    Charlotte     │
└─────────────┘                      │                  │
                                     │  ┌────────────┐  │
                                     │  │  Renderer  │  │
                                     │  │  Pipeline  │  │
                                     │  └─────┬──────┘  │
                                     │        │         │
                                     │  ┌─────▼──────┐  │
                                     │  │  Headless  │  │
                                     │  │  Chromium  │  │
                                     │  └────────────┘  │
                                     └──────────────────┘

Agents receive landmarks, headings, interactive elements with typed metadata, bounding boxes, form structures, and content summaries — all derived from what the browser already knows about every page.

Features

Navigationnavigate, back, forward, reload

Observationobserve (3 detail levels, structural tree view), find (spatial + semantic search, CSS selector mode), screenshot (with persistent artifact management), screenshots, screenshot_get, screenshot_delete, diff (structural comparison against snapshots)

Interactionclick, click_at (coordinate-based), type (with slow typing support), select, toggle, submit, scroll, hover, drag, key (single/sequence with element targeting), wait_for (async condition polling), upload (file input), fill_form (batch form fill), dialog (accept/dismiss JS dialogs)

Monitoringconsole (all severity levels, filtering, timestamps), requests (full HTTP history, method/status/resource type filtering)

Session Managementtabs, tab_open, tab_switch, tab_close, viewport (device presets), network (throttling, URL blocking), set_cookies, get_cookies, clear_cookies, set_headers, configure

Development Modedev_serve (static server + file watching with auto-reload), dev_inject (CSS/JS injection), dev_audit (a11y, performance, SEO, contrast, broken links)

Utilitiesevaluate (arbitrary JS execution in page context)

Tool Profiles

Charlotte ships 43 tools (42 registered + the charlotte_tools meta-tool), but most workflows only need a subset. Startup profiles control which tools load into the agent's context, reducing definition overhead by up to 78%.

bash
charlotte --profile browse    # 23 tools (default) — navigate, observe, interact, tabs
charlotte --profile core      # 7 tools — navigate, observe, find, click, type, submit
charlotte --profile full      # 43 tools — everything
charlotte --profile interact  # 31 tools — full interaction + dialog + evaluate
charlotte --profile develop   # 34 tools — interact + dev_serve, dev_inject, dev_audit
charlotte --profile audit     # 14 tools — navigation + observation + dev_audit + viewport

Agents can activate more tools mid-session without restarting:

code
charlotte_tools enable dev_mode    → activates dev_serve, dev_audit, dev_inject
charlotte_tools disable dev_mode   → deactivates them
charlotte_tools list               → see what's loaded

Quick Start

Prerequisites

  • Node.js >= 20
  • npm

Installation

Charlotte is listed on the MCP Registry as io.github.TickTockBent/charlotte and published on npm as @ticktockbent/charlotte:

bash
npm install -g @ticktockbent/charlotte

Docker images are available on Docker Hub and GitHub Container Registry:

bash
# Alpine (default, smaller)
docker pull ticktockbent/charlotte:alpine

# Debian (if you need glibc compatibility)
docker pull ticktockbent/charlotte:debian

# Or from GHCR
docker pull ghcr.io/ticktockbent/charlotte:latest

Or install from source:

bash
git clone https://github.com/ticktockbent/charlotte.git
cd charlotte
npm install
npm run build

Run

Charlotte communicates over stdio using the MCP protocol:

bash
# If installed globally (default browse profile)
charlotte

# With a specific profile
charlotte --profile core

# If installed from source
npm start

MCP Client Configuration

Claude Code

Create .mcp.json in your project root:

json
{
  "mcpServers": {
    "charlotte": {
      "type": "stdio",
      "command": "npx",
      "args": ["@ticktockbent/charlotte"],
      "env": {}
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

json
{
  "mcpServers": {
    "charlotte": {
      "command": "npx",
      "args": ["@ticktockbent/charlotte"]
    }
  }
}

Cursor

Add to .cursor/mcp.json:

json
{
  "mcpServers": {
    "charlotte": {
      "command": "npx",
      "args": ["@ticktockbent/charlotte"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

json
{
  "mcpServers": {
    "charlotte": {
      "command": "npx",
      "args": ["@ticktockbent/charlotte"]
    }
  }
}

VS Code (Copilot)

Add to .vscode/mcp.json:

json
{
  "servers": {
    "charlotte": {
      "type": "stdio",
      "command": "npx",
      "args": ["@ticktockbent/charlotte"]
    }
  }
}

Cline

Add to Cline MCP settings (via the Cline sidebar > MCP Servers > Configure):

json
{
  "mcpServers": {
    "charlotte": {
      "command": "npx",
      "args": ["@ticktockbent/charlotte"]
    }
  }
}

Amp

Add to ~/.amp/settings.json:

json
{
  "mcpServers": {
    "charlotte": {
      "command": "npx",
      "args": ["@ticktockbent/charlotte"]
    }
  }
}

See docs/mcp-setup.md for the full setup guide, including development mode, generic MCP clients, verification steps, and troubleshooting.

Usage Examples

Once connected, an agent can use Charlotte's tools:

Browse a website

code
navigate({ url: "https://example.com" })
// → 612 chars: landmarks, headings, interactive element counts

find({ type: "link", text: "More information" })
// → just the matching element with its ID

click({ element_id: "lnk-a3f1" })

Fill out a form

code
navigate({ url: "https://httpbin.org/forms/post" })
find({ type: "text_input" })
type({ element_id: "inp-c7e2", text: "hello@example.com" })
select({ element_id: "sel-e8a3", value: "option-2" })
submit({ form_id: "frm-b1d4" })

Local development feedback loop

code
dev_serve({ path: "./my-site", watch: true })
observe({ detail: "full" })
dev_audit({ checks: ["a11y", "contrast"] })
dev_inject({ css: "body { font-size: 18px; }" })

Page Representation

Charlotte returns structured representations with three detail levels that let agents control how much context they consume:

Minimal (default for navigate)

Landmarks, headings, and interactive element counts grouped by page region. Designed for orientation — "what's on this page?" — without listing every element.

json
{
  "url": "https://news.ycombinator.com",
  "title": "Hacker News",
  "viewport": { "width": 1280, "height": 720 },
  "structure": {
    "headings": [{ "level": 1, "text": "Hacker News", "id": "h-a1b2" }]
  },
  "interactive_summary": {
    "total": 93,
    "by_landmark": {
      "(page root)": { "link": 91, "text_input": 1, "button": 1 }
    }
  }
}

Summary (default for observe)

Full interactive element list with typed metadata, form structures, and content summaries.

json
{
  "url": "https://example.com/dashboard",
  "title": "Dashboard",
  "viewport": { "width": 1280, "height": 720 },
  "structure": {
    "landmarks": [
      { "id": "rgn-b2c1", "role": "banner", "label": "Site header", "bounds": { "x": 0, "y": 0, "w": 1280, "h": 64 } },
      { "id": "rgn-d4e5", "role": "main", "label": "Content", "bounds": { "x": 240, "y": 64, "w": 1040, "h": 656 } }
    ],
    "headings": [{ "level": 1, "text": "Dashboard", "id": "h-1a2b" }],
    "content_summary": "main: 2 headings, 5 links, 1 form"
  },
  "interactive": [
    {
      "id": "btn-a3f1",
      "type": "button",
      "label": "Create Project",
      "bounds": { "x": 960, "y": 80, "w": 160, "h": 40 },
      "state": {}
    }
  ],
  "forms": []
}

Full

Everything in summary, plus all visible text content on the page.

Detail Levels

LevelTokensUse case
minimal~50-200Orientation after navigation. What regions exist? How many interactive elements?
summary~500-5000Working with the page. Full element list, form structures, content summaries.
fullvariableReading page content. All visible text included.

Navigation tools default to minimal. The observe tool defaults to summary. Both accept an optional detail parameter to override.

Element IDs

Element IDs are stable across minor DOM mutations. They're generated by hashing a composite key of element type, ARIA role, accessible name, and DOM path signature:

code
btn-a3f1  (button)    inp-c7e2  (text input)
lnk-d4b9  (link)      sel-e8a3  (select)
chk-f1a2  (checkbox)  frm-b1d4  (form)
rgn-e0d2  (landmark)  hdg-0f40  (heading)
dom-b2c3  (DOM element, from CSS selector queries)

IDs survive unrelated DOM changes and element reordering within the same container. When an agent navigates at minimal detail (no individual element IDs), it uses find to locate elements by text, type, or spatial proximity — the returned elements include IDs ready for interaction.

Development

bash
# Run in watch mode
npm run dev

# Run all tests
npm test

# Run only unit tests
npm run test:unit

# Run only integration tests
npm run test:integration

# Type check
npx tsc --noEmit

Project Structure

code
src/
  browser/          # Puppeteer lifecycle, tab management, CDP sessions
  renderer/         # Accessibility tree extraction, layout, content, element IDs
  state/            # Snapshot store, structural differ
  tools/            # MCP tool definitions (navigation, observation, interaction, session, dev-mode)
  dev/              # Static server, file watcher, auditor
  types/            # TypeScript interfaces
  utils/            # Logger, hash, wait utilities
tests/
  unit/             # Fast tests with mocks
  integration/      # Full Puppeteer tests against fixture HTML
  fixtures/pages/   # Test HTML files

Architecture

The Renderer Pipeline is the core — it calls extractors in order and assembles a PageRepresentation:

  1. Accessibility tree extraction (CDP Accessibility.getFullAXTree)
  2. Layout extraction (CDP DOM.getBoxModel)
  3. Landmark, heading, interactive element, and content extraction
  4. Element ID generation (hash-based, stable across re-renders)

All tools go through renderActivePage() which handles snapshots, reload events, dialog detection, and response formatting.

Sandbox

Charlotte includes a test website in tests/sandbox/ that exercises all tools without touching the public internet. Serve it locally with:

code
dev_serve({ path: "tests/sandbox" })

Five pages cover navigation, forms, interactive elements, popups, delayed content, scroll containers, and more. See docs/sandbox.md for the full page reference and a tool-by-tool exercise checklist.

Known Issues

Shadow DOM — Open shadow DOM works transparently. Chromium's accessibility tree pierces open shadow boundaries, so web components (e.g., GitHub's <relative-time>, <tool-tip>) render their content into Charlotte's representation without special handling. Closed shadow roots are opaque to the accessibility tree and will not be captured.

Roadmap

Session & Configuration

Connect to Existing Browser — Add a --cdp-endpoint CLI argument so Charlotte can attach to an already-running browser via puppeteer.connect() instead of always launching a new instance. Enables working with logged-in sessions and browser extensions.

Persistent Init Scripts — Add a --init-script CLI argument to inject JavaScript on every page load via page.evaluateOnNewDocument(). Charlotte's dev_inject currently applies CSS/JS once and does not persist across navigations.

Configuration File — Support a --config CLI argument to load settings from a JSON file, simplifying repeatable setups and CI/CD integration.

Full Device Emulation — Extend charlotte_viewport to accept named devices (e.g., "iPhone 15") and configure user agent, touch support, and device pixel ratio via CDP, not just viewport dimensions.

Feature Roadmap

Video Recording — Record interactions as video, capturing the full sequence of agent-driven navigation and manipulation for debugging, documentation, and review.

ARM64 Docker Images — Add linux/arm64 platform support to the Docker publish workflow for native performance on Apple Silicon Macs and ARM servers.

See docs/playwright-mcp-gap-analysis.md for the full gap analysis against Playwright MCP, including lower-priority items (vision tools, testing/verification, tracing, transport, security) and areas where Charlotte has advantages.

Full Specification

See docs/CHARLOTTE_SPEC.md for the complete specification including all tool parameters, the page representation format, element identity strategy, and architecture details.

License

MIT

Community

  • Open a bug report for reproducible defects, regressions, or MCP-client-specific problems.
  • Open a feature request for workflow improvements or new capabilities.
  • Open a tool request if you want to propose a new tool, parameter surface, or profile placement.
  • Browse open issues to find current work and discussion.
  • Check the planned good first issue filter as maintainers tag starter-friendly tasks.

Contributing

See CONTRIBUTING.md for guidelines.

Part of a growing suite of literary-named MCP servers. See more at github.com/TickTockBent.

常见问题

io.github.TickTockBent/charlotte 是什么?

使用无头 Chromium 将网页渲染为结构化、可供 agent 读取与处理的页面表示。

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描109.6k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描109.6k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描109.6k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
82.9k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
51.5k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.2k

评论