Claude KVM

AI 与智能体

by aras-workspace

用于通过 VNC 控制远程桌面的 MCP server,基于原生 Swift daemon,并集成 Apple Vision OCR。

什么是 Claude KVM

用于通过 VNC 控制远程桌面的 MCP server,基于原生 Swift daemon,并集成 Apple Vision OCR。

README

<p align="center"> <a href="https://github.com/ARAS-Workspace/claude-kvm"> <img src="https://github.com/ARAS-Workspace/claude-kvm/raw/press-kit/press-kit/masters/claude-kvm-icon-animated.svg" alt="Claude KVM" width="180"/> </a> </p> <h1 align="center">Claude KVM</h1> <p align="center"><em>Remote Access, Artificial Intelligence</em></p> <p align="center"> <a href="https://www.claude-kvm.ai">claude-kvm.ai</a> </p>

Claude KVM is an MCP tool that controls remote desktop environments over VNC. It consists of a thin JS proxy layer (MCP server) and a platform-native Swift VNC daemon running on your macOS system.

Claude KVM Demo Claude KVM Demo Mac

[!TIP] Phantom-WG could be a great alternative for you. Isolate your VNC server within your own network while enjoying self-hosted VPN performance with the extra privacy features you gain along the way.

Live Test Runs

[!NOTE] Tests are conducted transparently on GitHub Actions — each step is visible in the CI environment. At the end of every test, whether the integration passes or fails, you'll find screenshots of each step the agent took during the session, along with an .mp4 video recording that captures the entire session. By reviewing these recordings and screenshots, you can observe how the agent progressed through each stage, how long the task took, and what decisions were made based on the system prompt. You can use these examples as a reference when crafting your own system prompts or instructions for the MCP server in your own environment.

[!WARNING] Artifacts attached to these runs may have expired due to GitHub's artifact retention policy. Persistent copies are prepared via the Persist Artifacts workflow and can always be accessed by run ID from the artifacts/ directory on the press-kit branch.

Architecture

mermaid
graph TB
    subgraph MCP["MCP Client (Claude)"]
        AI["Claude"]
    end

    subgraph Proxy["claude-kvm · MCP Proxy (stdio)"]
        direction TB
        Server["MCP Server<br/><code>index.js</code>"]
        Tools["Tool Definitions<br/><code>tools/index.js</code>"]
        Server --> Tools
    end

    subgraph Daemon["claude-kvm-daemon · Native VNC Client (stdin/stdout)"]
        direction TB
        CMD["Command Handler<br/><i>PC Dispatch</i>"]
        Scale["Display Scaling<br/><i>Scaled ↔ Native</i>"]

        subgraph Screen["Screen"]
            Capture["Frame Capture<br/><i>PNG · Crop · Diff</i>"]
            OCR["OCR Detection<br/><i>Apple Vision</i>"]
        end

        subgraph InputGroup["Input"]
            Mouse["Mouse<br/><i>Click · Drag · Move · Scroll</i>"]
            KB["Keyboard<br/><i>Tap · Combo · Type · Paste</i>"]
        end

        VNC["VNC Bridge<br/><i>LibVNCClient 0.9.15</i>"]

        CMD --> Scale
        Scale --> Capture
        Scale --> Mouse
        Scale --> KB
        Capture -.->|"framebuffer"| VNC
        Mouse -->|"pointer events"| VNC
        KB -->|"key events"| VNC
    end

    subgraph Target["Target Machine"]
        VNC_Server["VNC Server<br/><i>:5900</i>"]
        Desktop["Desktop Environment"]
        VNC_Server --> Desktop
    end

    AI <-->|"stdio<br/>JSON-RPC"| Server
    Server <-->|"stdin/stdout<br/>PC (NDJSON)"| CMD
    VNC <-->|"RFB Protocol<br/>TCP :5900"| VNC_Server

    classDef proxy fill:#1a1a2e,stroke:#16213e,color:#e5e5e5
    classDef daemon fill:#0f3460,stroke:#533483,color:#e5e5e5
    classDef target fill:#1a1a2e,stroke:#e94560,color:#e5e5e5

    class Server,Tools proxy
    class CMD,Scale,VNC,Capture,Mouse,KB daemon
    class VNC_Server,Desktop target

Layers

LayerLanguageRoleCommunication
MCP ProxyJavaScript (Node.js)Communicates with Claude over MCP protocol, manages daemon lifecyclestdio JSON-RPC
VNC DaemonSwift/C (Apple Silicon)VNC connection, screen capture, mouse/keyboard input injectionstdin/stdout PC (NDJSON)

PC (Procedure Call) Protocol

Communication between the proxy and daemon uses the PC protocol over NDJSON:

code
Request:      {"method":"<name>","params":{...},"id":<int|string>}
Response:     {"result":{...},"id":<int|string>}
Error:        {"error":{"code":<int>,"message":"..."},"id":<int|string>}
Notification: {"method":"<name>","params":{...}}

Coordinate Scaling

The VNC server's native resolution is scaled down to fit within --max-dimension (default: 1280px). Claude works more consistently with scaled coordinates — the daemon handles the conversion in the background:

code
Native:  4220 x 2568  (VNC server framebuffer)
Scaled:  1280 x 779   (what Claude sees and targets)

mouse_click(640, 400) → VNC receives (2110, 1284)

Screen Strategy

Claude minimizes token cost with a progressive verification approach:

code
diff_check       →  changeDetected: true/false     ~5ms    (text only, no image)
detect_elements  →  OCR text + bounding boxes      ~50ms   (text only, no image)
cursor_crop      →  crop around cursor              ~50ms   (small image)
screenshot       →  full screen capture             ~200ms  (full image)

detect_elements uses Apple Vision framework for on-device OCR. Returns text content with bounding box coordinates in scaled space — enables precise click targeting without consuming vision tokens.


Installation

Requirements

  • macOS (Apple Silicon / aarch64)
  • Node.js (LTS)

Daemon

bash
brew tap ARAS-Workspace/tap
brew install claude-kvm-daemon

[!NOTE] claude-kvm-daemon is compiled and code-signed via CI (GitHub Actions). The build output is packaged in two formats: a .tar.gz archive for Homebrew distribution and a .dmg disk image for notarization. The DMG is submitted to Apple servers for notarization within the same workflow — the process can be tracked from CI logs. The notarized DMG is available as a CI Artifact; the archived .tar.gz is also published as a release on the repository. Homebrew installation tracks this release.

MCP Configuration

Create a .mcp.json file in your project directory:

json
{
  "mcpServers": {
    "claude-kvm": {
      "command": "npx",
      "args": ["-y", "claude-kvm"],
      "env": {
        "VNC_HOST": "192.168.1.100",
        "VNC_PORT": "5900",
        "VNC_USERNAME": "user",
        "VNC_PASSWORD": "pass",
        "CLAUDE_KVM_DAEMON_PATH": "/opt/homebrew/bin/claude-kvm-daemon",
        "CLAUDE_KVM_DAEMON_PARAMETERS": "-v"
      }
    }
  }
}

[!NOTE] The tool is end-to-end tested via CI — Claude executes tasks over VNC while an independent vision model observes and verifies the results. See the Integration Test for live workflow runs, system prompts, and demo recordings.

Configuration

MCP Proxy (ENV)

ParameterDefaultDescription
VNC_HOST127.0.0.1VNC server address
VNC_PORT5900VNC port number
VNC_USERNAMEUsername (required for ARD)
VNC_PASSWORDPassword
CLAUDE_KVM_DAEMON_PATHclaude-kvm-daemonDaemon binary path (not needed if already in PATH)
CLAUDE_KVM_DAEMON_PARAMETERSAdditional CLI arguments for the daemon

Daemon Parameters (CLI)

Additional arguments passed to the daemon via CLAUDE_KVM_DAEMON_PARAMETERS:

code
"CLAUDE_KVM_DAEMON_PARAMETERS": "--max-dimension 800 -v"
ParameterDefaultDescription
--max-dimension1280Maximum display scaling dimension (px)
--connect-timeoutVNC connection timeout (seconds)
--bits-per-sampleBits per pixel sample
--no-reconnectDisable automatic reconnection
-v, --verboseVerbose logging (stderr)

Runtime Configuration (PC)

All timing and display parameters are configurable at runtime via the configure method. Use get_timing to inspect current values.

Set timing:

json
{"method":"configure","params":{"click_hold_ms":80,"key_hold_ms":50}}
json
{"result":{"detail":"OK — changed: click_hold_ms, key_hold_ms"}}

Change display scaling:

json
{"method":"configure","params":{"max_dimension":960}}
json
{"result":{"detail":"OK — changed: max_dimension","scaledWidth":960,"scaledHeight":584}}

Reset to defaults:

json
{"method":"configure","params":{"reset":true}}
json
{"result":{"detail":"OK — reset to defaults","timing":{"click_hold_ms":50,"combo_mod_ms":10,"cursor_crop_radius":150,"double_click_gap_ms":50,"drag_min_steps":10,"drag_pixels_per_step":20,"drag_position_ms":30,"drag_press_ms":50,"drag_settle_ms":30,"drag_step_ms":5,"hover_settle_ms":400,"key_hold_ms":30,"max_dimension":1280,"paste_settle_ms":30,"scroll_press_ms":10,"scroll_tick_ms":20,"type_inter_key_ms":20,"type_key_ms":20,"type_shift_ms":10},"scaledWidth":1280,"scaledHeight":779}}

Get current values:

json
{"method":"get_timing"}
json
{"result":{"timing":{"click_hold_ms":80,"combo_mod_ms":10,"cursor_crop_radius":150,"double_click_gap_ms":50,"drag_min_steps":10,"drag_pixels_per_step":20,"drag_position_ms":30,"drag_press_ms":50,"drag_settle_ms":30,"drag_step_ms":5,"hover_settle_ms":400,"key_hold_ms":50,"max_dimension":1280,"paste_settle_ms":30,"scroll_press_ms":10,"scroll_tick_ms":20,"type_inter_key_ms":20,"type_key_ms":20,"type_shift_ms":10},"scaledWidth":1280,"scaledHeight":779}}
ParameterDefaultDescription
max_dimension1280Max screenshot dimension
cursor_crop_radius150Cursor crop radius (px)
click_hold_ms50Click hold duration
double_click_gap_ms50Double-click gap delay
hover_settle_ms400Hover settle wait
drag_position_ms30Pre-drag position wait
drag_press_ms50Drag press hold threshold
drag_step_ms5Between interpolation pts
drag_settle_ms30Settle before release
drag_pixels_per_step20Point density per pixel
drag_min_steps10Min interpolation steps
scroll_press_ms10Scroll press-release gap
scroll_tick_ms20Inter-tick delay
key_hold_ms30Key hold duration
combo_mod_ms10Modifier settle delay
type_key_ms20Key hold during typing
type_inter_key_ms20Inter-character delay
type_shift_ms10Shift key settle
paste_settle_ms30Post-clipboard write wait

Tools

All operations are performed through a single vnc_command tool:

Screen

ActionParametersDescription
screenshotFull screen PNG capture
cursor_cropCrop around cursor with crosshair overlay
diff_checkDetect screen changes against baseline
set_baselineSave current screen as diff reference

Mouse

ActionParametersDescription
mouse_clickx, y, button?Click (left|right|middle)
mouse_double_clickx, yDouble click
mouse_movex, yMove cursor
hoverx, yMove + settle wait
nudgedx, dyRelative cursor movement
mouse_dragx, y, toX, toYDrag from start to end
scrollx, y, direction, amount?Scroll (up|down|left|right)

Keyboard

ActionParametersDescription
key_tapkeySingle key press (enter|escape|tab|space|...)
key_combokey or keysModifier combo ("cmd+c" or ["cmd","shift","3"])
key_typetextType text character by character
pastetextPaste text via clipboard

Detection

ActionParametersDescription
detect_elementsOCR text detection with bounding boxes (Apple Vision)

Returns text elements with bounding box coordinates in scaled space:

json
{"method":"detect_elements"}
json
{"result":{"detail":"13 elements","elements":[{"confidence":1,"h":9,"text":"Finder","w":32,"x":37,"y":6},{"confidence":1,"h":9,"text":"File","w":15,"x":84,"y":6},{"confidence":1,"h":9,"text":"Edit","w":19,"x":112,"y":6},{"confidence":1,"h":9,"text":"View","w":22,"x":143,"y":6},{"confidence":1,"h":11,"text":"Go","w":15,"x":179,"y":6},{"confidence":1,"h":9,"text":"Window","w":35,"x":207,"y":6},{"confidence":1,"h":11,"text":"Help","w":22,"x":255,"y":6},{"confidence":1,"h":11,"text":"8•","w":26,"x":1161,"y":6},{"confidence":1,"h":9,"text":"Fri Feb 20 22:19","w":80,"x":1189,"y":6},{"confidence":1,"h":9,"text":"Assets","w":32,"x":1202,"y":97},{"confidence":1,"h":9,"text":"Passwords.kdbx","w":74,"x":1181,"y":168},{"confidence":1,"h":93,"text":"PHANTOM","w":633,"x":322,"y":477},{"confidence":1,"h":32,"text":"YOUR SERVER, YOUR NETWORK, YOUR PRIVACY","w":629,"x":325,"y":568}],"scaledHeight":717,"scaledWidth":1280}}

Configuration

ActionParametersDescription
configure{<params>}Set timing/display params at runtime
configure{reset: true}Reset all params to defaults
get_timingGet current timing + display params

Control

ActionParametersDescription
waitms?Wait (default 500ms)
healthConnection status + display info
shutdownGraceful daemon shutdown

Authentication

Supported VNC authentication methods:

  • VNC Auth — password-based challenge-response (DES)
  • ARD — Apple Remote Desktop (Diffie-Hellman + AES-128-ECB)

macOS is auto-detected via the ARD auth type 30 credential request. When detected, Meta keys are remapped to Super (Command key compatibility).


MCP Badge

[!NOTE] Running on a bare-metal Mac? See the Mac M1 Preparation Tricks for VNC hardening, SSH tunneling, and session stability tips.


"Claude" is a trademark of Anthropic, PBC. This project is not affiliated with or endorsed by Anthropic.

Copyright (c) 2026 Riza Emre ARAS — MIT License

常见问题

Claude KVM 是什么?

用于通过 VNC 控制远程桌面的 MCP server,基于原生 Swift daemon,并集成 Apple Vision OCR。

相关 Skills

Claude接口

by anthropics

Universal
热门

面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。

想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心

AI 与智能体
未扫描109.6k

提示工程专家

by alirezarezvani

Universal
热门

覆盖Prompt优化、Few-shot设计、结构化输出、RAG评测与Agent工作流编排,适合分析token成本、评估LLM输出质量,并搭建可落地的AI智能体系统。

把提示优化、LLM评测到RAG与智能体设计串成一套方法,适合想系统提升AI开发效率的人。

AI 与智能体
未扫描9.0k

智能体流程设计

by alirezarezvani

Universal
热门

面向生产级多 Agent 编排,梳理顺序、并行、分层、事件驱动、共识五种工作流设计,覆盖 handoff、状态管理、容错重试、上下文预算与成本优化,适合搭建复杂 AI 协作系统。

帮你把多智能体流程设计、编排和自动化统一起来,复杂工作流也能更稳地落地,适合追求强控制力的团队。

AI 与智能体
未扫描9.0k

相关 MCP Server

顺序思维

编辑精选

by Anthropic

热门

Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。

这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。

AI 与智能体
82.9k

知识图谱记忆

编辑精选

by Anthropic

热门

Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。

帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。

AI 与智能体
82.9k

PraisonAI

编辑精选

by mervinpraison

热门

PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。

如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。

AI 与智能体
6.4k

评论