io.github.runbook-ai/browser-agent
AI 与智能体by runbook-ai
面向 AI agent 的浏览器自动化 MCP server,可执行页面操作、导航与任务流程。
什么是 io.github.runbook-ai/browser-agent?
面向 AI agent 的浏览器自动化 MCP server,可执行页面操作、导航与任务流程。
README
Runbook AI MCP Server
An MCP (Model Context Protocol) server that provides browser automation capabilities through a Chrome extension. It allows terminal-based agents like Claude Code to interact with any website through your live browser session.
Part of the Runbook AI ecosystem. Join the Discord community to provide your feedback and get involved in the development!
https://github.com/user-attachments/assets/a43fba64-bc40-4ef6-9840-e100203e2cf5
Why Runbook AI?
Most browser-based MCP tools (like chrome-devtools-mcp) blow up your LLM context window by sending the entire DOM after every browser action.
Runbook AI is different:
- Optimized Context: It generates a highly simplified version of the HTML. It strips the junk but keeps essential text and interaction elements. It’s condensed, fast, and won’t eat your tokens.
- The Ultimate Catch-all: If a site doesn't have a dedicated MCP server (like Expedia, LinkedIn, or internal tools), this fills the gap perfectly.
- Privacy First: It runs entirely in your browser. No remote calls except to your chosen LLM provider. No
eval()or shady scripts (enforced by the Chrome extension sandbox). - Efficient Navigation: The simplified HTML goes beyond the viewport, making scrolling and multi-page tasks much more efficient.
Installation
MCP Server
Add to your MCP settings configuration:
{
"mcpServers": {
"runbook-ai": {
"command": "npx",
"args": ["-y", "runbook-ai-mcp@latest"]
}
}
}
Chrome Extension
Install the Runbook AI extension from Chrome Web Store.
Enable MCP in the extension settings opened from extension side panel.
Set LLM API key, and model name, base URL. Use of Gemini 3 Flash (gemini-3-flash-preview) is recommended. Get your free API key from Google AI Studio.
By default the extension has access to all websites. If you want to limit the access, go to Chrome Extension Details, and add individual sites to Site access setting.
Usage
Open Chrome and keep the extension side panel open.
Start the MCP server (it will automatically start when invoked by your MCP client).
Tool Schema
The server exposes a single tool:
browser-agent
Run a task in Chrome browser with AI and automation capabilities.
Parameters:
prompt(string, required): The task prompt for the AI agent to execute
Example:
{
"name": "browser-agent",
"arguments": {
"prompt": "Go to google.com and search for 'MCP protocol'"
}
}
Development
# Install dependencies
npm install
# Build
npm run build
# Run in development mode
npm run dev
# Run tests
npm test
Architecture
- MCP Server: Communicates with MCP clients via stdio
- WebSocket Server: Listens for Chrome extension connections on port 9003
- Chrome Extension: Executes browser automation tasks
When a tool is invoked:
- MCP client sends request to MCP server via stdio
- MCP server forwards request to Chrome extension via WebSocket
- Extension executes the task and returns result
- Result is sent back to MCP client
Contributing
Contributions are welcome! Feel free to send out a PR.
常见问题
io.github.runbook-ai/browser-agent 是什么?
面向 AI agent 的浏览器自动化 MCP server,可执行页面操作、导航与任务流程。
相关 Skills
Claude接口
by anthropics
面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。
✎ 想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心
RAG架构师
by alirezarezvani
聚焦生产级RAG系统设计与优化,覆盖文档切块、检索链路、索引构建、召回评估等关键环节,适合搭建可扩展、高准确率的知识库问答与检索增强应用。
✎ 面向RAG落地,把知识库、向量检索和生成链路系统串联起来,做架构设计时更清晰,也更少踩坑。
计算机视觉
by alirezarezvani
聚焦目标检测、图像分割与视觉系统落地,覆盖 YOLO、DETR、Mask R-CNN、SAM 等方案,适合定制数据集训练、推理优化及 ONNX/TensorRT 部署。
✎ 把目标检测、图像分割到推理部署串成完整工程链路,主流框架与 YOLO、DETR、SAM 等方案都覆盖,落地视觉 AI 会省心很多。
相关 MCP Server
顺序思维
编辑精选by Anthropic
Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。
✎ 这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。
知识图谱记忆
编辑精选by Anthropic
Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。
✎ 帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。
PraisonAI
编辑精选by mervinpraison
PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。
✎ 如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。