io.github.xidik12/oculo

编码与调试

by xidik12

由 AI 驱动的原生浏览器,内置 12 个 MCP 工具;每页上下文开销约 30 tokens,轻量高效。

什么是 io.github.xidik12/oculo

由 AI 驱动的原生浏览器,内置 12 个 MCP 工具;每页上下文开销约 30 tokens,轻量高效。

README

<p align="center"> <img src="docs/logo.png" alt="Oculo" width="120"> </p> <h1 align="center">Oculo</h1> <p align="center"><strong>AI-Powered Native Browser</strong></p> <p align="center"> <a href="https://github.com/xidik12/oculo/stargazers"><img src="https://img.shields.io/github/stars/xidik12/oculo?style=flat" alt="Stars"></a> <a href="https://github.com/xidik12/oculo/releases"><img src="https://img.shields.io/github/v/release/xidik12/oculo" alt="Release"></a> <img src="https://img.shields.io/badge/Electron-34-47848F?logo=electron&logoColor=white" alt="Electron"> <img src="https://img.shields.io/badge/TypeScript-5.7-3178C6?logo=typescript&logoColor=white" alt="TypeScript"> <img src="https://img.shields.io/badge/React-19-61DAFB?logo=react&logoColor=black" alt="React"> <img src="https://img.shields.io/badge/MCP-12%20tools-orange" alt="MCP Tools"> <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-green" alt="License"></a> </p> <p align="center"> <a href="https://getoculo.com">Website</a> &middot; <a href="https://github.com/xidik12/oculo/releases">Download</a> &middot; <a href="#quick-start">Quick Start</a> &middot; <a href="#12-mcp-tools">MCP Tools</a> &middot; <a href="CONTRIBUTING.md">Contributing</a> </p>

Cursor : VSCode :: Oculo : Chrome

Open-source AI browser that gives Claude Code, Cursor, Windsurf, and any MCP client the ability to see and interact with any website. 12 tools, under 300 tokens per flow.

Why Oculo?

Feature
Native browserFull Chromium engine -- not a wrapper, extension, or headless scraper
12 MCP toolspage, act, fill, read, run, media, shell, tabs, research, preview, translate, lens
< 300 tokens/flowCompact responses by default -- cheaper than screenshot-based approaches
Self-healing automationSelector caching + DOM diffing -- 44%+ faster on repeated workflows
Multi-provider AIBuilt-in chat with Claude, OpenAI, Gemini, Grok, OpenClaw, Ollama
4-level securityauto / notify / confirm / blocked permission gate on every action
OS keychain vaultCredentials encrypted via electron.safeStorage (macOS Keychain / Windows DPAPI)
PII redactionCredit cards, SSNs, JWTs, API keys, Bearer tokens stripped from all MCP responses
Anti-injectionContent boundary markers + regex-based injection detection
19 stealth patchesNavigator, WebGL, canvas, WebRTC, audio, font, battery, screen fingerprint defenses
Headless modeRun without UI -- Docker support included
Cross-platformmacOS, Windows, Linux
Python SDKpip install oculo -- sync and async clients

Quick Start

Download

Grab the latest release from Releases, or build from source:

bash
git clone https://github.com/xidik12/oculo.git
cd oculo
npm install
npm run dev

Register with Claude Code

bash
claude mcp add oculo -- node ~/oculo/bin/oculo-mcp.mjs

Register with Cursor / Windsurf

Add to your MCP config (.cursor/mcp.json or equivalent):

json
{
  "mcpServers": {
    "oculo": {
      "command": "node",
      "args": ["/path/to/oculo/bin/oculo-mcp.mjs"]
    }
  }
}

Tools are always discoverable (static definitions in the bridge), but Oculo must be running for tool calls to succeed.

12 MCP Tools

ToolWhat it doesToken cost
pageDescribe current page -- headings, forms, buttons, links. Supports compact, a11y (ref-tagged), and markdown modes~30-80
actNavigate, click, hover, scroll, type, press keys, login via vault, manage tabs, cookies, proxy, recording~1 line
fillFill form fields by label/placeholder matching, optional submit. Handles text, select, checkbox, contenteditable~1 line
readExtract structured data -- search results, tables, lists, articlescompact
runMulti-step pipeline with conditionals (page/act/fill/read/wait/if). Cached for replayheader + last
mediaGenerate images (Nano Banana 2 / DALL-E 3) or videos (Veo 3.1). Image-to-image editingfile path
shellExecute shell commands non-interactively (ls, npm, git, python, etc.)stdout+stderr
tabsList all open browser tabs with URLs and titlescompact
researchDeep web research -- opens multiple tabs, reads pages, synthesizes findingssynthesized
previewPre-fetch a URL without navigating away from the current pagepage description
translateTranslate page content or specific text to any languagetranslated text
lensVisual analysis of the current page via screenshot + AI visiondescription

Bonus: webmcp_list and webmcp_call discover and invoke page-declared tools via the WebMCP protocol.

Example Flows

code
You: "Log into GitHub and star the oculo repo"

Claude Code calls:
  1. act({action: "navigate", url: "https://github.com/login"})
  2. act({action: "login", site: "github.com"})         # vault lookup
  3. act({action: "navigate", url: "https://github.com/xidik12/oculo"})
  4. act({action: "click", text: "Star"})

Total: 4 tool calls, <100 tokens response
code
You: "Fill out the contact form on example.com"

Claude Code calls:
  1. act({action: "navigate", url: "https://example.com/contact"})
  2. page()                                               # see the form
  3. fill({fields: {"Name": "...", "Email": "..."}, submit: true})

Total: 3 tool calls

Headless Mode

Run Oculo without a visible window for CI/CD, scraping, or server-side automation:

bash
# Via convenience script
node bin/oculo-headless.mjs

# Or with flags
npx electron . --headless
npx electron . --headless --headless-auto-approve   # auto-approve CONFIRM actions

# Environment variable
OCULO_HEADLESS=1 npm run dev

Docker

bash
docker compose up

The included Dockerfile and docker-compose.yml run Oculo headless in a container with Xvfb.

Python SDK

python
from oculo import OculoClient

# Auto-discovers port from ~/.oculo-port
client = OculoClient()

# Describe the page
print(client.page())

# Navigate
client.act("navigate", url="https://example.com")

# Fill a form
client.fill({"Email": "hi@oculo.com", "Message": "Hello!"}, submit=True)

# Extract data
results = client.read("search results", format="json")

Async version available:

python
from oculo import AsyncOculoClient

async_client = AsyncOculoClient()
await async_client.act("navigate", url="https://example.com")

Install from the SDK directory:

bash
pip install oculo

Architecture

code
Claude Code / Cursor / Windsurf
        |
        | stdio (MCP protocol)
        v
  bin/oculo-mcp.mjs            <-- stdio-to-HTTP bridge
        |
        | HTTP POST :19516/mcp (auth token)
        v
  McpServerManager              <-- Electron main process
        |
        | IPC
        v
  Renderer (React 19)           <-- Chromium process
        |
        | webview.executeJavaScript()
        v
  <webview> tags                <-- Actual web pages

Why HTTP instead of stdio? Electron's <webview> is only accessible from the renderer process. The main process (where stdio lives) can't touch page content. The HTTP bridge solves this via main-to-renderer IPC.

Port discovery: Oculo writes port:authtoken to ~/.oculo-port on startup. The bridge reads this file automatically.

Security

Permission Levels

LevelActionsBehavior
Autonavigate, page, read, scroll, screenshot, back, forward, reload, hover, listTabs, switchTab, preview, translate, lensExecutes silently
Notifyclick, type, fill, select, press, submit, newTab, closeTabExecutes + OS notification
Confirmpayment, delete_account, change_password, send_email, download, oauth, shell, evaluate, setProxy, startRecordingNative dialog approval required
Blockedread_vault, export_cookies, export_tokens, disable_securityAlways rejected

Credential Vault

  • Encrypted with electron.safeStorage (OS Keychain on macOS, DPAPI on Windows)
  • Passwords never returned via IPC or MCP -- only domain + username exposed
  • act({action: "login", site: "github.com"}) retrieves and fills credentials automatically

PII Redaction

All MCP responses pass through a redactor before reaching the AI client. Stripped patterns: credit card numbers, SSNs, JWTs, API keys, private keys, Bearer tokens.

Anti-Injection

MCP content is wrapped in boundary markers. Regex-based detection blocks prompt injection attempts embedded in page content.

Stealth (19 patches)

Navigator (webdriver, languages, plugins, mimeTypes, connection, hardwareConcurrency, deviceMemory), window (chrome API, dimensions), WebGL (vendor/renderer spoofing), canvas (fingerprint randomization), WebRTC (IP leak prevention), AudioContext, font enumeration blocking, Battery API, screen resolution randomization.

Self-Healing Automation

After successful act or fill calls, element selectors are cached with stability scores:

Selector typeScore
id10
data-testid10
aria-label9
role + name8
text7
css5

On subsequent runs, DOM diffing determines the strategy:

  • > 80% similarity -- replay from cache (no LLM call needed)
  • 50-80% -- fallback to alternative selectors
  • < 50% -- re-engage AI for fresh resolution

AI Providers

Built-in chat panel supports multiple providers:

ProviderAuthModels
ClaudeAPI Key or CLI subscriptionOpus, Sonnet, Haiku
OpenAIAPI Key or Codex CLIGPT-4o, GPT-4o mini, o1, o3
GeminiAPI Key2.0 Flash, 1.5 Pro, 1.5 Flash
GrokAPI KeyGrok 2, Grok 2 Mini
OllamaLocal (no key)Any pulled model
OpenClawAPI KeyOpenClaw models

Building

bash
# Production build
npm run build

# Platform distributables
npm run dist:mac      # macOS DMG + ZIP
npm run dist:win      # Windows NSIS + portable
npm run dist:linux    # Linux AppImage + deb

# Other commands
npm run typecheck     # TypeScript checking
npm run lint          # ESLint
npm run test          # Vitest
npm run clean         # Remove build artifacts

Prerequisites

  • Node.js 20+
  • npm (not pnpm/yarn -- native modules require npm)
  • macOS, Windows, or Linux

Project Structure

code
src/
  main/                    Electron main process
    ai/agent.ts            Multi-provider AI controller
    captcha/               CAPTCHA detection + solvers
    data/                  Bookmarks, downloads, history, session recording
    engine/                Page describer, extractor, form-detector, pipeline, resolver,
                           selector-cache, dom-differ, tab-manager
    mcp/server.ts          HTTP MCP server (port 19516-19520, auth token)
    mcp/tools/             act, fill, page, read, run tool handlers
    network/proxy.ts       HTTP/SOCKS proxy manager
    security/              Vault, permissions, redactor, audit, anti-injection
  preload/index.ts         contextBridge API
  renderer/
    App.tsx                Root browser UI component
    components/            TabBar, AddressBar, ChatPanel, WebViewContainer,
                           bookmarks, downloads, find, history, layout, common
  shared/                  Types, constants, IPC channels, AI provider definitions
bin/
  oculo-mcp.mjs            stdio-to-HTTP MCP bridge (for Claude Code / Cursor)
  oculo-headless.mjs        Headless mode launcher
sdk/python/                Python SDK (pip install oculo)
Dockerfile                 Container deployment
docker-compose.yml         Docker Compose for headless mode

Contributing

See CONTRIBUTING.md for development setup, architecture details, and how to add new MCP tools.

Donate

If Oculo saves you time, consider supporting development:

BTC: 12yRGpUfFznzZoz4yVfZKRxLSkAwbanw2B

License

MIT


Built by Salakhitdinov Khidayotullo | getoculo.com

常见问题

io.github.xidik12/oculo 是什么?

由 AI 驱动的原生浏览器,内置 12 个 MCP 工具;每页上下文开销约 30 tokens,轻量高效。

相关 Skills

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描114.1k

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描114.1k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描114.1k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
83.4k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
52.2k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.3k

评论