me.openbrowser/openbrowser-ai

搜索与获取

by billy-enrizky

提供 AI 浏览器自动化,可用异步 Python 编写导航、点击、输入与数据提取脚本,适合网页任务编排。

把重复网页操作交给 AI 浏览器自动化处理,用异步 Python 串起导航、点击、输入和采集,做批量网页任务编排尤其顺手。

什么是 me.openbrowser/openbrowser-ai

提供 AI 浏览器自动化,可用异步 Python 编写导航、点击、输入与数据提取脚本,适合网页任务编排。

README

OpenBrowser

Saved Cookies and Scheduled Tasks is available in the cloud-hosted version. Join the waitlist for early access: https://openbrowser.me :

https://github.com/user-attachments/assets/b17f97f3-f9f8-4707-8e39-abbbbe1a693b

Automating Walmart Product Scraping:

https://github.com/user-attachments/assets/c517c739-9199-47b0-bac7-c2c642a21094

OpenBrowserAI Automatic Flight Booking:

https://github.com/user-attachments/assets/632128f6-3d09-497f-9e7d-e29b9cb65e0f

OpenBrowserAI Automatic Form Filling:

https://github.com/user-attachments/assets/16f7ef1a-beb1-45e2-a733-9592536e0ef7

PyPI version Downloads Python 3.12+ License: MIT Tests Coverage

<!-- mcp-name: me.openbrowser/openbrowser-ai -->

AI-powered browser automation using CodeAgent and CDP (Chrome DevTools Protocol)

OpenBrowser is a framework for intelligent browser automation. It combines direct CDP communication with a CodeAgent architecture, where the LLM writes Python code executed in a persistent namespace, to navigate, interact with, and extract information from web pages autonomously.

Table of Contents

Documentation

Full documentation: https://docs.openbrowser.me

Key Features

  • CodeAgent Architecture - LLM writes Python code in a persistent Jupyter-like namespace for browser automation
  • Raw CDP Communication - Direct Chrome DevTools Protocol for maximum control and speed
  • Vision Support - Screenshot analysis for visual understanding of pages
  • 12+ LLM Providers - OpenAI, Anthropic, Google, Groq, AWS Bedrock, Azure OpenAI, Ollama, and more
  • MCP Server - Model Context Protocol support for Claude Desktop integration
  • CLI Daemon - Persistent browser daemon with -c flag for direct code execution from Bash
  • Video Recording - Record browser sessions as video files

Installation

Quick install (macOS / Linux)

bash
curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh

Quick install (Windows PowerShell)

powershell
irm https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.ps1 | iex

Detects uv, pipx, or pip and installs OpenBrowser automatically.

Install to ~/.local/bin without sudo:

bash
curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh -s -- --local

Homebrew (macOS / Linux)

bash
brew tap billy-enrizky/openbrowser
brew install openbrowser-ai

pip

bash
pip install openbrowser-ai

uv (recommended)

bash
uv pip install openbrowser-ai

uvx (zero install)

Run directly without installing -- uvx downloads and caches the package automatically:

bash
# MCP server mode
uvx openbrowser-ai --mcp

# CLI daemon mode
uvx openbrowser-ai -c "await navigate('https://example.com')"

pipx

bash
pipx install openbrowser-ai

From source

bash
git clone https://github.com/billy-enrizky/openbrowser-ai.git
cd openbrowser-ai
uv pip install -e ".[agent]"

Optional Dependencies

bash
pip install openbrowser-ai[agent]      # LLM agent support (langgraph, langchain, litellm)
pip install openbrowser-ai[all]        # All LLM providers
pip install openbrowser-ai[anthropic]  # Anthropic Claude
pip install openbrowser-ai[groq]       # Groq
pip install openbrowser-ai[ollama]     # Ollama (local models)
pip install openbrowser-ai[aws]        # AWS Bedrock
pip install openbrowser-ai[azure]      # Azure OpenAI
pip install openbrowser-ai[video]      # Video recording support

No separate browser install needed. OpenBrowser auto-detects any installed Chromium-based browser (Chrome, Edge, Brave, Chromium) and uses it directly. If none is found and uvx is available, Chromium is installed automatically on first run. To pre-install manually (requires uvx): openbrowser-ai install

Quick Start

Basic Usage

python
import asyncio
from openbrowser import CodeAgent, ChatGoogle

async def main():
    agent = CodeAgent(
        task="Go to google.com and search for 'Python tutorials'",
        llm=ChatGoogle(model="gemini-3-flash"),
    )

    result = await agent.run()
    print(f"Result: {result}")

asyncio.run(main())

With Different LLM Providers

python
from openbrowser import CodeAgent, ChatOpenAI, ChatAnthropic, ChatGoogle

# OpenAI
agent = CodeAgent(task="...", llm=ChatOpenAI(model="gpt-5.2"))

# Anthropic
agent = CodeAgent(task="...", llm=ChatAnthropic(model="claude-sonnet-4-6"))

# Google Gemini
agent = CodeAgent(task="...", llm=ChatGoogle(model="gemini-3-flash"))

Using Browser Session Directly

python
import asyncio
from openbrowser import BrowserSession, BrowserProfile

async def main():
    profile = BrowserProfile(
        headless=True,
        viewport_width=1920,
        viewport_height=1080,
    )
    
    session = BrowserSession(browser_profile=profile)
    await session.start()
    
    await session.navigate_to("https://example.com")
    screenshot = await session.screenshot()
    
    await session.stop()

asyncio.run(main())

Configuration

Environment Variables

bash
# Google (recommended)
export GOOGLE_API_KEY="..."

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Groq
export GROQ_API_KEY="gsk_..."

# AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

BrowserProfile Options

python
from openbrowser import BrowserProfile

profile = BrowserProfile(
    headless=True,
    viewport_width=1280,
    viewport_height=720,
    disable_security=False,
    extra_chromium_args=["--disable-gpu"],
    record_video_dir="./recordings",
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass",
    },
)

Supported LLM Providers

ProviderClassModels
GoogleChatGooglegemini-3-flash, gemini-3-pro
OpenAIChatOpenAIgpt-5.2, o4-mini, o3
AnthropicChatAnthropicclaude-sonnet-4-6, claude-opus-4-6
GroqChatGroqllama-4-scout, qwen3-32b
AWS BedrockChatAWSBedrockanthropic.claude-sonnet-4-6, amazon.nova-pro
AWS Bedrock (Anthropic)ChatAnthropicBedrockClaude models via Anthropic Bedrock SDK
Azure OpenAIChatAzureOpenAIAny Azure-deployed model
OpenRouterChatOpenRouterAny model on openrouter.ai
DeepSeekChatDeepSeekdeepseek-chat, deepseek-r1
CerebrasChatCerebrasllama-4-scout, qwen-3-235b
OllamaChatOllamallama-4-scout, deepseek-r1 (local)
OCIChatOCIRawOracle Cloud GenAI models
Browser-UseChatBrowserUseExternal LLM service

Claude Code Plugin

Install OpenBrowser as a Claude Code plugin:

bash
# Add the marketplace (one-time)
claude plugin marketplace add billy-enrizky/openbrowser-ai

# Install the plugin
claude plugin install openbrowser@openbrowser-ai

This installs the MCP server and 6 built-in skills:

SkillDescription
web-scrapingExtract structured data, handle pagination
form-fillingFill forms, login flows, multi-step wizards
e2e-testingTest web apps by simulating user interactions
page-analysisAnalyze page content, structure, metadata
accessibility-auditAudit pages for WCAG compliance
file-downloadDownload files (PDFs, CSVs) using browser session

See plugin/README.md for detailed tool parameter documentation.

Codex

OpenBrowser works with OpenAI Codex via native skill discovery.

Quick Install

Tell Codex:

code
Fetch and follow instructions from https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/refs/heads/main/.codex/INSTALL.md

Manual Install

bash
# Clone the repository
git clone https://github.com/billy-enrizky/openbrowser-ai.git ~/.codex/openbrowser

# Symlink skills for native discovery
mkdir -p ~/.agents/skills
ln -s ~/.codex/openbrowser/plugin/skills ~/.agents/skills/openbrowser

# Restart Codex

Then configure the MCP server in your project (see MCP Server below).

Detailed docs: .codex/INSTALL.md

OpenCode

OpenBrowser works with OpenCode.ai via plugin and skill symlinks.

Quick Install

Tell OpenCode:

code
Fetch and follow instructions from https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/refs/heads/main/.opencode/INSTALL.md

Manual Install

bash
# Clone the repository
git clone https://github.com/billy-enrizky/openbrowser-ai.git ~/.config/opencode/openbrowser

# Create directories
mkdir -p ~/.config/opencode/plugins ~/.config/opencode/skills

# Symlink plugin and skills
ln -s ~/.config/opencode/openbrowser/.opencode/plugins/openbrowser.js ~/.config/opencode/plugins/openbrowser.js
ln -s ~/.config/opencode/openbrowser/plugin/skills ~/.config/opencode/skills/openbrowser

# Restart OpenCode

Then configure the MCP server in your project (see MCP Server below).

Detailed docs: .opencode/INSTALL.md

OpenClaw

OpenClaw supports OpenBrowser via the CLI daemon. Install OpenBrowser, then use openbrowser-ai -c from the Bash tool:

bash
openbrowser-ai -c "await navigate('https://example.com')"
openbrowser-ai -c "print(await evaluate('document.title'))"

The daemon starts automatically on first use and persists variables across calls.

For OpenClaw plugin documentation, see docs.openclaw.ai/tools/plugin.

MCP Server

MCP Registry

OpenBrowser includes an MCP (Model Context Protocol) server that exposes browser automation as tools for AI assistants like Claude. Listed on the MCP Registry as me.openbrowser/openbrowser-ai. No external LLM API keys required -- the MCP client provides the intelligence.

Quick Setup

Claude Code: add to your project's .mcp.json:

json
{
  "mcpServers": {
    "openbrowser": {
      "command": "uvx",
      "args": ["openbrowser-ai", "--mcp"]
    }
  }
}

Claude Desktop: add to ~/Library/Application Support/Claude/claude_desktop_config.json:

json
{
  "mcpServers": {
    "openbrowser": {
      "command": "uvx",
      "args": ["openbrowser-ai", "--mcp"],
      "env": {
        "OPENBROWSER_HEADLESS": "true"
      }
    }
  }
}

Run directly:

bash
uvx openbrowser-ai --mcp

Tool

The MCP server exposes a single execute_code tool that runs Python code in a persistent namespace with browser automation functions. The LLM writes Python code to navigate, interact, and extract data, returning only what was explicitly requested.

Available functions (all async, use await):

CategoryFunctions
Navigationnavigate(url, new_tab), go_back(), wait(seconds)
Interactionclick(index), input_text(index, text, clear), scroll(down, pages, index), send_keys(keys), upload_file(index, path)
Dropdownsselect_dropdown(index, text), dropdown_options(index)
Tabsswitch(tab_id), close(tab_id)
JavaScriptevaluate(code): run JS in page context, returns Python objects
Downloadsdownload_file(url, filename): download a file using browser cookies, list_downloads(): list downloaded files
Statebrowser.get_browser_state_summary(): get page metadata and interactive elements
CSSget_selector_from_index(index): get CSS selector for an element
Completiondone(text, success): signal task completion

Pre-imported libraries: json, csv, re, datetime, asyncio, Path, requests, numpy, pandas, matplotlib, BeautifulSoup

Configuration

Environment VariableDescriptionDefault
OPENBROWSER_HEADLESSRun browser without GUIfalse
OPENBROWSER_ALLOWED_DOMAINSComma-separated domain whitelist(none)
OPENBROWSER_COMPACT_DESCRIPTIONMinimal tool description (~500 tokens)false
OPENBROWSER_MAX_OUTPUTMax output characters per execution10000

Benchmark: Token Efficiency

CLI Benchmark: 4-Way Comparison (6 Tasks, N=3 runs)

Four CLI tools compared with a single Bash tool each. Claude Sonnet 4.6 on Bedrock. Randomized order. All achieve 100% accuracy.

<p align="center"> <img src="benchmarks/cli_benchmark_scatter.png" alt="CLI Benchmark: Token Usage vs Duration" width="800" /> </p>
CLI ToolDuration (mean +/- std)Tool CallsBedrock API TokensResponse Chars
openbrowser-ai84.8 +/- 10.9s15.3 +/- 2.336,010 +/- 6,0639,452 +/- 472
browser-use106.0 +/- 9.5s20.7 +/- 6.477,123 +/- 33,35436,241 +/- 12,940
agent-browser99.0 +/- 6.8s25.0 +/- 4.090,107 +/- 3,69856,009 +/- 39,733
playwright-cli118.3 +/- 21.4s25.7 +/- 8.194,130 +/- 35,98284,065 +/- 49,713

openbrowser-ai uses 2.1-2.6x fewer tokens than all competitors via Python code batching and compact DOM representation.

<p align="center"> <img src="benchmarks/cli_benchmark_overview.png" alt="CLI Benchmark: Overview" width="800" /> </p>

Per-Task Token Usage

<p align="center"> <img src="benchmarks/cli_benchmark_per_task.png" alt="CLI Benchmark: Per-Task Token Usage" width="800" /> </p>
Taskopenbrowser-aibrowser-useplaywright-cliagent-browser
fact_lookup2,5044,71016,8579,676
form_fill7,88715,81131,75719,226
multi_page_extract2,3542,4058,8868,117
search_navigate16,53947,93627,77944,367
deep_navigation2,1783,7474,7055,534
content_analysis4,5482,5154,1473,189

openbrowser-ai wins 5 of 6 tasks. The advantage is largest on complex pages (search_navigate: 2.9x fewer tokens than browser-use) where code batching avoids repeated page state dumps.

Cost per Benchmark Run (6 Tasks)

Modelopenbrowser-aibrowser-useplaywright-cliagent-browser
Claude Sonnet 4.6 ($3/$15 per M)$0.12$0.24$0.29$0.27
Claude Opus 4.6 ($5/$25 per M)$0.24$0.45$0.56$0.51

Raw results are in benchmarks/e2e_4way_cli_results.json. Full 4-way comparison with methodology.

E2E LLM Benchmark: MCP Server Comparison (6 Tasks, N=5 runs)

<p align="center"> <img src="benchmarks/benchmark_comparison.png" alt="E2E LLM Benchmark: MCP Server Comparison" width="800" /> </p>
MCP ServerPass RateDuration (mean +/- std)Tool CallsBedrock API Tokens
Playwright MCP (Microsoft)100%62.7 +/- 4.8s9.4 +/- 0.9158,787
Chrome DevTools MCP (Google)100%103.4 +/- 2.7s19.4 +/- 0.5299,486
OpenBrowser MCP100%77.0 +/- 6.7s13.8 +/- 2.050,195

OpenBrowser uses 3.2x fewer tokens than Playwright and 6.0x fewer than Chrome DevTools. MCP response sizes: Playwright 1,132,173 chars, Chrome DevTools 1,147,244 chars, OpenBrowser 7,853 chars -- a 144x difference.

Full MCP comparison with methodology

CLI Usage

bash
# Run a browser automation task with an LLM agent
uvx openbrowser-ai -p "Search for Python tutorials on Google"

# Execute code directly via persistent daemon
uvx openbrowser-ai -c "await navigate('https://example.com')"
uvx openbrowser-ai -c "print(await evaluate('document.title'))"

# Daemon management
uvx openbrowser-ai daemon start     # Start daemon (auto-starts on first -c call)
uvx openbrowser-ai daemon stop      # Stop daemon and browser
uvx openbrowser-ai daemon status    # Show daemon info
uvx openbrowser-ai daemon restart   # Restart daemon

# Install browser
uvx openbrowser-ai install

# Run MCP server
uvx openbrowser-ai --mcp

The -c flag connects to a persistent browser daemon over a Unix socket (localhost TCP on Windows). Variables persist across calls while the daemon is running. The daemon starts automatically on first use and shuts down after 10 minutes of inactivity.

Project Structure

code
openbrowser-ai/
├── .claude-plugin/            # Claude Code marketplace config
├── .codex/                    # Codex integration
│   └── INSTALL.md
├── .opencode/                 # OpenCode integration
│   ├── INSTALL.md
│   └── plugins/openbrowser.js
├── plugin/                    # Plugin package (skills + MCP config)
│   ├── .claude-plugin/
│   ├── .mcp.json
│   └── skills/                # 6 browser automation skills
├── src/openbrowser/
│   ├── __init__.py            # Main exports
│   ├── cli.py                 # CLI commands
│   ├── config.py              # Configuration
│   ├── actor/                 # Element interaction
│   ├── agent/                 # LangGraph agent
│   ├── browser/               # CDP browser control
│   ├── code_use/              # Code agent + shared executor
│   ├── daemon/                # Persistent browser daemon (Unix socket)
│   ├── dom/                   # DOM extraction
│   ├── llm/                   # LLM providers
│   ├── mcp/                   # MCP server
│   └── tools/                 # Action registry
├── benchmarks/                # MCP benchmarks and E2E tests
│   ├── playwright_benchmark.py
│   ├── cdp_benchmark.py
│   ├── openbrowser_benchmark.py
│   └── e2e_published_test.py
└── tests/                     # Test suite

Testing

bash
# Run unit tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# E2E test the MCP server against the published PyPI package
uv run python benchmarks/e2e_published_test.py

Benchmarks

Run individual MCP server benchmarks (JSON-RPC stdio, 5-step Wikipedia workflow):

bash
uv run python benchmarks/openbrowser_benchmark.py   # OpenBrowser MCP
uv run python benchmarks/playwright_benchmark.py     # Playwright MCP
uv run python benchmarks/cdp_benchmark.py            # Chrome DevTools MCP

Raw results are in benchmarks/e2e_4way_cli_results.json. See full comparison for methodology.

Backend and Frontend Deployment

The project includes a FastAPI backend and a Next.js frontend, both containerized with Docker.

Prerequisites

  • Docker and Docker Compose
  • A .env file in the project root with POSTGRES_PASSWORD and any LLM API keys (see backend/env.example)

Local Development (Docker Compose)

bash
# Start backend + PostgreSQL (frontend runs locally)
docker-compose -f docker-compose.dev.yml up --build

# In a separate terminal, start the frontend
cd frontend && npm install && npm run dev
ServiceURLDescription
Backendhttp://localhost:8000FastAPI + WebSocket + VNC
Frontendhttp://localhost:3000Next.js dev server
PostgreSQLlocalhost:5432Chat persistence
VNCws://localhost:6080Live browser view

The dev compose mounts backend/app/ and src/ as volumes for hot-reload. API keys are loaded from backend/.env via env_file. The POSTGRES_PASSWORD is read from the root .env file.

Full Stack (Docker Compose)

bash
# Start all services (backend + frontend + PostgreSQL)
docker-compose up --build

This builds and runs both the backend and frontend containers together with PostgreSQL.

Backend

The backend is a FastAPI application in backend/ with a Dockerfile at backend/Dockerfile. It includes:

  • REST API on port 8000
  • WebSocket endpoint at /ws for real-time agent communication
  • VNC support (Xvfb + x11vnc + websockify) for live browser viewing on ports 6080-6090
  • Kiosk security: Openbox window manager, Chromium enterprise policies, X11 key grabber daemon
  • Health check at /health
bash
# Build the backend image
docker build -f backend/Dockerfile -t openbrowser-backend .

# Run standalone
docker run -p 8000:8000 -p 6080:6080 \
  --env-file backend/.env \
  -e VNC_ENABLED=true \
  -e AUTH_ENABLED=false \
  --shm-size=2g \
  openbrowser-backend

Frontend

The frontend is a Next.js application in frontend/ with a Dockerfile at frontend/Dockerfile.

bash
# Build the frontend image
cd frontend && docker build -t openbrowser-frontend .

# Run standalone
docker run -p 3000:3000 \
  -e NEXT_PUBLIC_API_URL=http://localhost:8000 \
  -e NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws \
  openbrowser-frontend

Environment Variables

Key environment variables for the backend (see backend/env.example for the full list):

VariableDescriptionDefault
GOOGLE_API_KEYGoogle/Gemini API key(required)
DEFAULT_LLM_MODELDefault model for agentsgemini-3-flash-preview
AUTH_ENABLEDEnable Cognito JWT authfalse
VNC_ENABLEDEnable VNC browser viewingtrue
DATABASE_URLPostgreSQL connection string(optional)
POSTGRES_PASSWORDPostgreSQL password (root .env)(required for compose)

Research: Reinforcement Fine-Tuning for Browser Agents

Beyond the framework, we conducted two independent research studies on improving browser agents through reinforcement learning, both using the FormFactory benchmark (1,250 form-filling tasks across 8 domains) and OpenBrowser's browser execution environment.

Study 1: Browser-in-the-Loop (Autoregressive RL)

We investigated whether reinforcement learning can improve a language model's ability to fill web forms beyond what supervised learning achieves.

  • Method: Two-phase pipeline -- SFT on Qwen3-8B with QLoRA (992 demonstrations), then online GRPO with live browser execution rewards (composite: 40% submission success + 40% field accuracy + 20% execution completeness)
  • Result: GRPO achieves 9.1% higher average reward than SFT alone on held-out validation (p=0.007, Wilcoxon signed-rank test). Improvement comes specifically from better form submission, not field filling.
  • Key finding: SFT is a prerequisite -- without it, the base model generates unstructured text and earns zero reward across all attempts.
  • Paper: ResearchGate DOI: 10.13140/RG.2.2.24922.71360
  • Models: Qwen3-8B-FormFactory-SFT-LoRA, Qwen3-8B-FormFactory-GRPO-LoRA

Study 2: Diffusion Language Models for Web Action Planning

We investigated whether diffusion language models -- which generate text by iteratively denoising an entire sequence in parallel rather than left-to-right -- can learn web action planning.

  • Models tested: ReFusion 8B (masked diffusion with causal LM backbone) and FS-DFM 1.3B (pure discrete flow matching)
  • Result: After SFT, diffusion models solve 60-69% of tasks vs. 100% for the AR baseline. Token-level RL is universally fragile (2/16 comparisons improve, both insignificant). Sequence-level RL succeeds: MDPO pushes ReFusion to 91.9% (+31.4pp) and ESPO pushes FS-DFM to 87.1% (+18.6pp).
  • Key finding: The appropriate RL formulation is architecture-dependent. ELBO-based optimization (ESPO) produces concentrated distributions across architectures, while per-step trajectory methods produce multimodal distributions.
  • Paper: ResearchGate DOI: 10.13140/RG.2.2.11500.94088
  • Models: 10 trained models on HuggingFace including ReFusion-8B-MDPO, FS-DFM-1.3B-ESPO-mu8, and more

Reproducing RL Experiments

All training code is in infra/training/. Training runs on a single NVIDIA A10G GPU (24GB VRAM) via Anyscale.

bash
# Study 1: Autoregressive RL (Qwen3-8B)
# SFT phase -- QLoRA fine-tuning on 992 FormFactory demonstrations (2-4 hours)
python infra/training/finetuning/sft_trainer.py

# Online GRPO phase -- browser-in-the-loop reward (4-8 hours per epoch)
# Requires headless Chromium + FormFactory forms server
python infra/training/shared/formfactory_server.py &   # Start form server
python infra/training/finetuning/online_grpo_trainer.py

# Evaluate SFT and GRPO checkpoints on val/test splits
python infra/training/finetuning/eval_sft.py

# Study 2: Diffusion LM RL (ReFusion 8B, FS-DFM 1.3B)
# SFT phase
python infra/training/flow_matching/fsdfm_sft_trainer.py    # FS-DFM SFT
python infra/training/flow_matching/flow_sft_trainer.py      # ReFusion SFT

# Sequence-level RL (best results)
python infra/training/flow_matching/espo_fsdfm_trainer.py    # ESPO on FS-DFM
python infra/training/flow_matching/espo_refusion_trainer.py # ESPO on ReFusion
python infra/training/flow_matching/mdpo_fsdfm_trainer.py    # MDPO on FS-DFM
python infra/training/flow_matching/mdpo_refusion_trainer.py # MDPO on ReFusion

# Submit jobs to Anyscale cloud
python infra/training/anyscale/submit_job.py --config infra/training/anyscale/online_grpo_job.yaml

# Push trained checkpoints to HuggingFace
python infra/training/anyscale/push_checkpoints_to_hf.py

# Serve trained model locally via vLLM or Ollama
python infra/training/serving/serve_vllm.py
python infra/training/serving/export_gguf.py   # Export to GGUF for Ollama

Reward function (in infra/training/shared/reward_functions.py): composite score = 0.4 * task completion + 0.4 * field accuracy + 0.2 * execution completeness. Online reward (online_reward.py) launches headless Chromium, executes the model's action plan, and computes the score from live browser state.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact


Made with love for the AI automation community

常见问题

me.openbrowser/openbrowser-ai 是什么?

提供 AI 浏览器自动化,可用异步 Python 编写导航、点击、输入与数据提取脚本,适合网页任务编排。

相关 Skills

agent-browser

by chulla-ceja

热门

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

搜索与获取
未扫描3.7k

接口规范

by alexxxiong

热门

API 规范管理工具 - 跨项目 API 文档的初始化、更新、查询与搜索。Triggers: 'API文档', 'API规范', '接口文档', '路由解析', 'apispec', 'API lookup', 'API search'.

搜索与获取
未扫描3.7k

investment-research

by caijichang212

热门

Perform structured investment research (投研分析) for a company/stock/ETF/sector using a repeatable framework: fundamentals (basic/财务报表与商业模式), technical analysis (技术指标与关键价位), industry research (行业景气与竞争格局), valuation (估值对比/情景), catalysts and risks, and produce a professional research report + actionable plan. Use when the user asks for: equity/ETF analysis, earnings/financial statement breakdown, peer/industry comparison, valuation ranges, bull/base/bear scenarios, technical trend/support-resistance, or a full research memo.

搜索与获取
未扫描3.7k

相关 MCP Server

by Anthropic

热门

Puppeteer 是让 Claude 自动操作浏览器进行网页抓取和测试的 MCP 服务器。

这个服务器解决了手动编写 Puppeteer 脚本的繁琐问题,适合需要自动化网页交互的开发者,比如抓取动态内容或做端到端测试。不过,作为参考实现,它可能缺少生产级的安全防护,建议在可控环境中使用。

搜索与获取
82.9k

网页抓取

编辑精选

by Anthropic

热门

Fetch 是 MCP 官方参考服务器,让 AI 能抓取网页并转为 Markdown 格式。

这个服务器解决了 AI 直接处理网页内容时格式混乱的问题,适合需要让 Claude 分析在线文档或新闻的开发者。不过作为参考实现,它缺乏生产级的安全配置,你得自己处理反爬虫和隐私风险。

搜索与获取
82.9k

Brave 搜索

编辑精选

by Anthropic

热门

Brave Search 是让 Claude 直接调用 Brave 搜索 API 获取实时网络信息的 MCP 服务器。

如果你想让 AI 助手帮你搜索最新资讯或技术文档,这个工具能绕过传统搜索的限制,直接返回结构化数据。特别适合需要实时信息的开发者,比如查 API 更新或竞品动态。不过它依赖 Brave 的 API 配额,高频使用可能受限。

搜索与获取
82.9k

评论