io.github.jfarcand/iphone-mirroir-mcp

编码与调试

by jfarcand

通过 macOS 的 iPhone Mirroring 控制真实 iPhone,支持截图、点击、滑动及文本输入。

什么是 io.github.jfarcand/iphone-mirroir-mcp

通过 macOS 的 iPhone Mirroring 控制真实 iPhone,支持截图、点击、滑动及文本输入。

README

<p align="center"> <img src="website/public/mirroir-wordmark.svg" alt="mirroir-mcp" width="128" /> </p>

mirroir-mcp

npm version Build Install Installers MCP Compliance License macOS 15+ Discord

Give your AI eyes, hands, and a real iPhone. An MCP server that lets any AI agent see the screen, tap what it needs, and figure the rest out — through macOS iPhone Mirroring. Experimental support for macOS windows. 32 tools, any MCP client.

Requirements

Install

bash
/bin/bash -c "$(curl -fsSL https://mirroir.dev/get-mirroir.sh)"

or via npx:

bash
npx -y mirroir-mcp install

or via Homebrew:

bash
brew tap jfarcand/tap && brew install mirroir-mcp

The first time you take a screenshot, macOS will prompt for Screen Recording and Accessibility permissions. Grant both.

<details> <summary>Per-client setup</summary>

Claude Code

bash
claude mcp add --transport stdio mirroir -- npx -y mirroir-mcp

GitHub Copilot (VS Code)

Install from the MCP server gallery: search @mcp mirroir in the Extensions view, or add to .vscode/mcp.json:

json
{
  "servers": {
    "mirroir": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "mirroir-mcp"]
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project root:

json
{
  "mcpServers": {
    "mirroir": {
      "command": "npx",
      "args": ["-y", "mirroir-mcp"]
    }
  }
}

OpenAI Codex

bash
codex mcp add mirroir -- npx -y mirroir-mcp

Or add to ~/.codex/config.toml:

toml
[mcp_servers.mirroir]
command = "npx"
args = ["-y", "mirroir-mcp"]
</details> <details> <summary>Install from source</summary>
bash
git clone https://github.com/jfarcand/mirroir-mcp.git
cd mirroir-mcp
./mirroir.sh

Use the full path to the binary in your .mcp.json: <repo>/.build/release/mirroir-mcp.

</details>

How it works

Every interaction follows the same loop: observe, reason, act. describe_screen gives the AI every text element with tap coordinates (eyes). The LLM decides what to do next (brain). tap, type_text, swipe execute the action (hands) — then it loops back to observe. No scripts, no coordinates, just intent.

Examples

Paste any of these into Claude Code, Claude Desktop, ChatGPT, Cursor, or any MCP client:

code
Open Messages, find my conversation with Alice, and send "running 10 min late".
code
Open Calendar, create a new event called "Dentist" next Tuesday at 2pm.
code
Open my Expo Go app, tap "LoginDemo", test the login screen with
test@example.com / password123. Screenshot after each step.
code
Start recording, open Settings, scroll to General > About, stop recording.

Screen Intelligence

describe_screen is the AI's eyes. Three backends work together to give the agent a complete picture of what's on screen — text, icons, and semantic UI structure.

Apple Vision OCR (default)

The default backend uses Apple's Vision framework to detect every text element on screen and return exact tap coordinates. This is fast, local, and requires no API keys or external services.

Icon Detection (YOLO CoreML)

Text-only OCR misses non-text UI elements — buttons, toggles, tab bar icons, activity rings. Drop a YOLO CoreML model (.mlmodelc) in ~/.mirroir-mcp/models/ and the server auto-detects it at startup, merging icon detection results with OCR text. The AI gets tap targets for elements that text-only OCR cannot see.

ModeocrBackend settingBehavior
Auto-detect (default)"auto"Uses Vision + YOLO if a model is installed, Vision only otherwise
Vision only"vision"Apple Vision OCR text only
YOLO only"yolo"CoreML element detection only
Both"both"Always merge both backends (falls back to Vision if no model)

AI Vision Mode (embacle)

Instead of local OCR, describe_screen can send the screenshot to an AI vision model that identifies UI elements semantically — cards, tabs, buttons, icons, navigation structure — not just raw text. This produces richer context for the agent, especially on screens with complex layouts.

The embacle runtime is embedded directly into the mirroir-mcp binary via Rust FFI. describe_screen calls the embedded runtime in-process — no separate server, no network round-trip, no additional setup. The FFI layer (EmbacleFFI.swiftlibembacle.a) handles initialization, chat completion requests, and memory management across the Swift/Rust boundary.

embacle routes vision requests through already-authenticated CLI tools (GitHub Copilot, Claude Code) so there is no separate API key to manage. If you have a Copilot or Claude Code subscription, you already have access.

Install

bash
brew tap dravr-ai/tap
brew install embacle          # CLI tools (embacle-server, embacle-mcp)
brew install embacle-ffi      # Rust FFI static library (libembacle.a)

Then rebuild mirroir-mcp from source (or reinstall via Homebrew) so the binary links against libembacle.a:

bash
# From source
swift build -c release

# Or via Homebrew (rebuilds automatically)
brew reinstall mirroir-mcp

Zero-config activation

When the embacle FFI is linked into the binary, screenDescriberMode defaults to "auto" which automatically resolves to vision mode. No settings change required — install embacle-ffi, rebuild, and describe_screen starts using AI vision.

To force local OCR even when embacle is available, explicitly set "ocr":

json
// .mirroir-mcp/settings.json
{
  "screenDescriberMode": "ocr"
}

See Configuration for all available settings.

Skills

When you find yourself repeating the same agent workflow, capture it as a skill. Skills are SKILL.md files — numbered steps the AI follows, adapting to layout changes and unexpected dialogs. Steps like Tap "Email" use OCR — no hardcoded coordinates.

Place files in ~/.mirroir-mcp/skills/ (global) or <cwd>/.mirroir-mcp/skills/ (project-local).

markdown
---
version: 1
name: Commute ETA Notification
app: Waze, Messages
tags: ["workflow", "cross-app"]
---

## Steps

1. Launch **Waze**
2. Wait for "Où va-t-on ?" to appear
3. Tap "Où va-t-on ?"
4. Wait for "${DESTINATION:-Travail}" to appear
5. Tap "${DESTINATION:-Travail}"
6. Wait for "Y aller" to appear
7. Tap "Y aller"
8. Wait for "min" to appear
9. Remember: Read the commute time and ETA.
10. Press Home
11. Launch **Messages**
12. Tap "New Message"
13. Type "${RECIPIENT}" and select the contact
14. Type "On my way! ETA {eta}"
15. Press **Return**
16. Screenshot: "message_sent"

${VAR} placeholders resolve from environment variables. ${VAR:-default} for fallbacks.

Skill Marketplace

Install ready-to-use skills from jfarcand/mirroir-skills:

bash
git clone https://github.com/jfarcand/mirroir-skills ~/.mirroir-mcp/skills

From Exploration to CI

The generate_skill tool lets an AI agent explore an app and produce SKILL.md files. It uses breadth-first search (BFS) to traverse the app as a navigation graph — screens are nodes, tappable elements are edges. The explorer describes each screen, matches elements against component definitions to decide what to tap, visits child screens, and backtracks via the back chevron. Duplicate screens are skipped via structural fingerprinting. See Component Detection below for how the explorer interprets raw elements into structured UI components.

The explorer works viewport-by-viewport: after calibrating the page length, it builds a plan from the current viewport, taps elements top-to-bottom, scrolls down to reveal more content, and rebuilds the plan for each new viewport. This approach works with both OCR and AI vision describers. Pass seed for deterministic ordering across runs.

Exploration is bounded — it does not discover every reachable screen in large apps. Depth, screen count, and time limits keep runs practical. For targeted flows, provide a goal to focus the traversal.

mermaid
graph TD
    A["Launch App"] --> B["Describe Screen"]
    B --> C{"Calibrated?"}
    C -- No --> D["Scroll Full Page"]
    D --> E{"skip_calibration?"}
    E -- No --> F["Component Detect +\nClassify + Validate"]
    E -- Yes --> G["Classify Elements\nDirectly"]
    F --> H["Build Plan"]
    G --> H
    C -- Yes --> H

    H --> I{"Untried\nElements?"}
    I -- Yes --> J["Tap Element"]
    I -- No --> K["Return to Root"]

    J --> M["Describe +\nClassify Edge"]
    M --> N{"Transition"}
    N -- new screen --> O["Add to Frontier"]
    O --> P["Backtrack"]
    N -- revisited/dead --> P

    P -- push: tap back --> H
    P -- modal: tap close --> H
    P -- tab: tap prev --> H

    K --> Q{"Frontier\nEmpty?"}
    Q -- No --> R["Next Frontier\nScreen"]
    R --> B
    Q -- Yes --> S["Generate SKILL.md"]

Generate

Two modes: autonomous exploration (BFS) and guided session (manual step-by-step).

Autonomous BFS exploration — the agent explores on its own:

code
Explore the Settings app and generate a skill that checks the iOS version.

This calls generate_skill(action: "explore", app_name: "Settings", goal: "check iOS version") under the hood. The explorer launches the app, runs BFS from the root screen, and outputs a SKILL.md for the discovered path.

ParameterDefaultDescription
app_namerequiredApp to explore
goalnoneFocus exploration toward a specific flow (e.g. "check software version")
goalsnoneArray of goals — one SKILL.md per goal
max_depth6Maximum BFS depth
max_screens30Maximum screens to visit
max_time300Maximum seconds before stopping
strategyauto"mobile" (default), "social" (Reddit, Instagram), or "desktop" (macOS windows)
skip_calibrationfalseSkip component detection during calibration. Scrolling still runs. Useful with AI vision describers that produce clean semantic elements
seedrandomInteger seed for deterministic exploration ordering. Same seed produces identical tap sequences
freshtrueDiscard persisted navigation graph and explore from scratch. Set false for incremental exploration

Guided session — the AI navigates manually, capturing each screen:

  1. generate_skill(action: "start", app_name: "MyApp") — launch app, OCR first screen
  2. Use tap/swipe/type_text to navigate, then generate_skill(action: "capture") to record each screen
  3. generate_skill(action: "finish") — assemble captured screens into a SKILL.md

Test

Run skills deterministically from the CLI — no AI in the loop:

bash
mirroir test apps/settings/check-about
mirroir test --junit results.xml --verbose        # JUnit output
mirroir test --dry-run apps/settings/check-about    # validate without executing
OptionDescription
--junit <path>Write JUnit XML report
--screenshot-dir <dir>Save failure screenshots (default: ./mirroir-test-results/)
--timeout <seconds>wait_for timeout (default: 15)
--verboseStep-by-step detail
--dry-runParse and validate without executing
--no-compiledSkip compiled skills, force full OCR

Exit code 0 = all pass, 1 = any failure.

Compiled Skills

Compile a skill once to capture coordinates and timing. Replay with zero OCR — a 10-step skill drops from 5+ seconds of OCR to under a second.

bash
mirroir compile apps/settings/check-about        # compile
mirroir test apps/settings/check-about            # auto-detects .compiled.json
mirroir test --no-compiled check-about            # force full OCR

AI agents auto-compile skills as a side-effect of the first MCP run. See Compiled Skills for details.

AI-Assisted Diagnosis

When a test step fails, pass --agent to get an AI diagnosis of what went wrong and suggested fixes:

bash
mirroir test --agent gpt-5.3 apps/settings/check-about
mirroir test --agent claude-sonnet-4-6 apps/settings/check-about
mirroir test --agent ollama:llama3 apps/settings/check-about
mirroir test --agent embacle apps/settings/check-about

Built-in agents:

AgentProviderAPI Key
gpt-5.3OpenAIOPENAI_API_KEY
claude-sonnet-4-6, claude-haiku-4-5AnthropicANTHROPIC_API_KEY
ollama:<model>Ollama (local)None
embacle, embacle:claudeembacle-serverCLI agent key

Custom agents can be defined as YAML profiles in ~/.mirroir-mcp/agents/.

<details> <summary>No API key? Use embacle</summary>

embacle routes requests through already-authenticated CLI tools (GitHub Copilot, Claude Code, etc.) — no separate API key needed:

bash
brew tap dravr-ai/tap && brew install embacle
mirroir test --agent embacle my-skill
</details>

Component Detection

The explorer doesn't guess from raw OCR — it matches screen regions against component definitions. Raw OCR returns a flat list of text elements with no structure (General and > are two unrelated strings). Component definitions bridge this gap: each definition is a .md file that describes a UI pattern (table rows, toggles, tab bars, summary cards) with match rules, interaction behavior, and grouping logic.

The detection pipeline groups OCR elements into rows, evaluates each row against all loaded definitions using hard constraints (zone, element count, chevron presence) and soft scoring signals, then selects the highest-scoring match. Multi-row elements (e.g. Health app summary cards with title + subtitle + value) are absorbed into a single tappable component via the grouping rules.

Each definition specifies:

  • Match Rules — zone (nav bar / content / tab bar), element count range, chevron/numeric/text patterns, minimum OCR confidence
  • Interaction — whether to tap, which element to target (first_navigation_element, centered_element, etc.), expected result (navigates, toggles, dismisses), and whether to backtrack after
  • Grouping — how many points below the anchor row to absorb, and under what conditions

20 iOS component definitions ship built-in. Place custom definitions in ~/.mirroir-mcp/components/ or <cwd>/.mirroir-mcp/components/. Test a definition against the current live screen with calibrate_component.

Vision Indicators

AI vision describers describe UI elements semantically ("Activité chevron") rather than character-by-character ("Activité" + ">"). A vision-indicators.md file maps these descriptions to OCR-compatible characters so the component pipeline works identically with both backends:

markdown
## Indicators
- chevron: >
- dismiss: ×
- back: <

When a vision element ends with a mapped suffix (e.g. "Entraînements chevron"), the normalizer splits it into two elements: "Entraînements" + ">". Place vision-indicators.md alongside your component definitions.

See Component Detection for the full definition format, match rule reference, and the detection pipeline.

Security

Giving an AI access to your phone demands defense in depth. mirroir-mcp is fail-closed at every layer.

  • Tool permissions — Without a config file, only read-only tools (screenshot, describe_screen) are exposed. Mutating tools are hidden from the MCP client entirely — it never sees them.
  • App blockingblockedApps in permissions.json prevents the AI from interacting with sensitive apps like Wallet or Banking, even if mutating tools are allowed.
  • No root required — Runs as a regular user process using the macOS CGEvent API. No daemons, no kernel extensions, no root privileges — just Accessibility permissions.
  • Kill switch — Close iPhone Mirroring to kill all input instantly.
json
// ~/.mirroir-mcp/permissions.json
{
  "allow": ["tap", "swipe", "type_text", "press_key", "launch_app"],
  "deny": [],
  "blockedApps": ["Wallet", "Banking"]
}

See Permissions and Security for the full threat model.

CLI Tools

Recorder

Record interactions as a skill file:

bash
mirroir record -o login-flow.yaml -n "Login Flow" --app "MyApp"

Doctor

Verify your setup:

bash
mirroir doctor
mirroir doctor --json    # machine-readable output

Configure

Set up your keyboard layout for non-US keyboards:

bash
mirroir configure

Updating

bash
# curl installer
/bin/bash -c "$(curl -fsSL https://mirroir.dev/get-mirroir.sh)"

# npx
npx -y mirroir-mcp install

# Homebrew
brew upgrade mirroir-mcp

# From source
git pull && swift build -c release

Uninstall

bash
# Homebrew
brew uninstall mirroir-mcp

# From source
./uninstall-mirroir.sh

Configuration

All settings live in settings.json — project-local (.mirroir-mcp/settings.json) or global (~/.mirroir-mcp/settings.json). Project-local settings override global ones. Every setting also has a corresponding environment variable (e.g. MIRROIR_SCREEN_DESCRIBER_MODE).

json
{
  "screenDescriberMode": "auto",
  "agent": "embacle",
  "ocrBackend": "auto",
  "keystrokeDelayUs": 15000,
  "explorationMaxScreens": 30
}

See Configuration Reference for all 40+ settings covering screen intelligence, input timing, scroll behavior, exploration budgets, AI providers, and keyboard layouts.

Documentation

Tools ReferenceAll 32 tools, parameters, and input workflows
ConfigurationAll settings: screen intelligence, input timing, exploration, AI providers
FAQSecurity, focus stealing, keyboard layouts, embacle/vision mode
SecurityThreat model, kill switch, and recommendations
PermissionsFail-closed permission model and config file
Known LimitationsFocus stealing, keyboard layout gaps, autocorrect
Component DetectionComponent definitions, calibration, and the detection pipeline
YOLO Icon DetectionRecommended YOLO models, CoreML setup, and configuration
Compiled SkillsZero-OCR skill replay
TestingFakeMirroring, integration tests, and CI strategy
TroubleshootingDebug mode and common issues
ContributingHow to add tools, commands, and tests
Skills MarketplaceSkill format, plugin discovery, and authoring

Community

Join the Discord server to ask questions, share skills, and discuss ideas.

Contributing

Contributions welcome. By submitting a patch, you agree to the Contributor License Agreement — your Git commit metadata serves as your electronic signature.


Why "mirroir"? — It's the old French spelling of miroir (mirror). A nod to the author's roots, not a typo.

常见问题

io.github.jfarcand/iphone-mirroir-mcp 是什么?

通过 macOS 的 iPhone Mirroring 控制真实 iPhone,支持截图、点击、滑动及文本输入。

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描111.1k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描111.1k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描111.1k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
83.0k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
51.7k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.2k

评论