io.github.ikoskela/precision-desktop
编码与调试by ikoskela
Fixes DPI coordinate scaling for Windows desktop automation MCP servers.
什么是 io.github.ikoskela/precision-desktop?
Fixes DPI coordinate scaling for Windows desktop automation MCP servers.
README
precision-desktop
mcp-name: io.github.ikoskela/precision-desktop
A companion MCP server that fixes DPI coordinate scaling for Windows desktop automation.
Windows DPI scaling silently breaks every MCP tool that clicks, types, or hovers on the desktop. precision-desktop detects and corrects the coordinate mismatch so your AI agent's clicks actually land where they should.
The Problem
Windows has two coordinate systems and doesn't tell you which one you're using.
When Windows DPI scaling is set above 100% (which it is on most modern laptops and monitors), different Windows APIs return coordinates in different systems:
| Coordinate System | Used By | Example: Point at 50% across a 4K display |
|---|---|---|
| Physical (pixels) | Mouse events, SetCursorPos, UI Automation* | 1920, 1080 |
| Logical (DPI-scaled) | GetWindowRect, Cursor.Position, .NET, screenshots | 1097, 617 (at 175% scaling) |
* UI Automation returns physical on DPI-aware processes, logical on others — yet another inconsistency.
The ratio between them is your DPI scale factor (e.g., 1.25x, 1.5x, 1.75x, 2.0x).
The problem isn't just "some APIs return logical." It's that different APIs return different coordinate systems with no indication of which one you're getting. There's no flag, no header, no type annotation — just numbers that look identical but mean completely different things.
Three ways this breaks AI agents
1. API mismatch — Click tools accept physical coordinates, but common Windows APIs return logical:
What the AI wants to click: [Button at physical (1920, 1080)]
What GetWindowRect says: [Button at logical (1097, 617)]
Where the click lands: (1097, 617) in physical space
^^^^^^^^^ WRONG - completely different spot
2. Screenshot mismatch — Screen captures are taken at logical resolution, but click tools expect physical coordinates. On a 3840x2400 display at 175% scaling, screenshots are 2194x1371 pixels. When a vision model (GPT-4o, Claude, CogAgent) looks at a screenshot and estimates "the button is at pixel (500, 300)," that's a logical coordinate. Clicking there in physical space misses by hundreds of pixels:
Vision model sees: [Button at (500, 300) in screenshot]
Screenshot space: Logical (2194x1371)
Click-Tool expects: Physical (3840x2400)
Correct click: (875, 525) ← needs 1.75x conversion
3. Mixed sources within the same tool — Even a single tool can return both systems. For example, windows-mcp's State-Tool reports element coordinates in physical space (correct for clicking), but its screenshots are captured at logical resolution. An agent combining both — reading element positions from the accessibility tree and estimating positions from screenshots — will silently mix coordinate systems.
This isn't an edge case
- ~1 billion active Windows devices worldwide
- ~30-50% have DPI scaling above 100% — that's 300-500 million machines
- Windows auto-enables scaling >100% on most modern laptops (13-16" screens at 1080p+ get 125-150% by default)
- 47% of PC users now run resolutions above 1080p (Steam Hardware Survey) — 21% at 1440p, 5% at 2560x1600, 4.2% at 4K — and climbing
- The MCP ecosystem has 5,800+ servers and 97M+ monthly SDK downloads — every Windows desktop automation server will hit this
If you've ever watched an AI agent click confidently at exactly the wrong spot, DPI scaling is probably why.
The Solution
precision-desktop is a companion MCP server that sits alongside your desktop automation MCP (like windows-mcp) and provides:
- Calibration — Measure the actual DPI scale factor on this specific machine using known screen landmarks
- Coordinate conversion — Convert between physical and logical systems on demand
- UI element finding — Locate elements by name via Windows UI Automation, returning physical coordinates ready for clicking
- Health checks — Detect stale calibration, verify UI Automation availability, check companion MCP status
- Patch awareness — Track which DPI-aware patches have been applied to the companion MCP
Tools
| Tool | Description |
|---|---|
calibrate | Compute DPI scale factors from 2+ reference points with known physical and logical coordinates |
calibrate_verify | Mark calibration as verified after confirming a test click landed correctly |
get_calibration | Read current calibration state (scale factors, verification status, age) |
convert_coordinates | Convert a coordinate pair between physical and logical systems |
find_ui_element | Find a single UI element by name using Windows UI Automation. Returns physical coordinates |
find_all_ui_elements | Find all UI elements matching a name. Returns list with physical coordinates |
list_ui_elements | List all named interactive elements (buttons, text fields, etc.) in a window |
find_window | Find a window handle (hwnd) by title substring |
health_check | Run environment checks: calibration freshness, UI Automation, companion MCP status |
patch_status | Check which DPI-aware patches are applied to the companion MCP |
Quick Start
1. Install
Clone this repo and install dependencies:
git clone https://github.com/ikoskela/precision-desktop.git
cd precision-desktop
pip install -e .
2. Configure MCP
Add to your Claude Code MCP configuration (.mcp.json or settings):
{
"mcpServers": {
"precision-desktop": {
"command": "python",
"args": ["C:/path/to/precision-desktop/server.py"]
}
}
}
Or if using Claude Desktop, add to claude_desktop_config.json:
{
"mcpServers": {
"precision-desktop": {
"command": "python",
"args": ["C:\\path\\to\\precision-desktop\\server.py"]
}
}
}
3. Calibrate
The AI agent calibrates itself on first use. The flow:
- Agent calls
health_check— discovers calibration is missing - Agent gathers reference points — uses a coordinate reading tool (like MPos) or Move-Tool + Cursor.Position to get both physical and logical coordinates at 2+ known screen landmarks
- Agent calls
calibratewith the reference points — computes scale factors - Agent verifies — moves cursor to a known element, confirms it landed correctly, calls
calibrate_verify
Calibration persists in state/calibration.json and only needs to be redone if DPI settings change or the display configuration changes.
Calibration Guide
What you need
Two coordinate readings for the same point:
- Physical coordinates — from Move-Tool, Click-Tool, or State-Tool (these operate in physical space)
- Logical coordinates — from
[System.Windows.Forms.Cursor]::Positionin PowerShell, orGetWindowRectAPI calls
Good calibration landmarks
- Start button (bottom-left of taskbar) — easy to locate precisely
- Date/time (bottom-right of taskbar) — anchors the opposite corner
- Minimize button of any maximized window — well-defined clickable target
Example calibration call
{
"points": [
{
"physical_x": 38,
"physical_y": 2365,
"logical_x": 21,
"logical_y": 1351,
"label": "start_button"
},
{
"physical_x": 3691,
"physical_y": 2332,
"logical_x": 2109,
"logical_y": 1332,
"label": "datetime"
}
]
}
This computes a scale factor (here, ~1.75x) and persists it for future coordinate conversions.
Verification
After calibration, the agent should:
- Use
convert_coordinatesto convert a known logical position to physical - Use Move-Tool to move the cursor there
- Confirm the cursor is on the expected target
- Call
calibrate_verifywithsuccess: true
UI Element Finding
Beyond coordinate conversion, precision-desktop can locate elements directly by name using Windows UI Automation — no coordinates needed.
Agent: find_ui_element(element_name="Save", window_title="Notepad")
→ { "name": "Save", "center_x": 450, "center_y": 32, "control_type": "button", ... }
(coordinates are physical, ready for Click-Tool)
This works for:
- Native Windows application controls (buttons, text fields, menus)
- Chrome extension popups and dialogs
- Overlay windows that State-Tool may not see
- Any UI element exposed via Windows UI Automation
Scoped search
You can scope searches to a specific window to avoid finding elements in the wrong application:
find_ui_element(element_name="Submit", window_title="My App")
find_ui_element(element_name="Close", window_handle=12345)
Discovery
Don't know the element name? Use list_ui_elements to see what's available:
list_ui_elements(window_title="Settings")
→ [{ "name": "General", "control_type": "tab item", ... },
{ "name": "Apply", "control_type": "button", ... }, ...]
Integration with windows-mcp
precision-desktop is designed to work alongside windows-mcp (or any MCP that provides desktop click/type/scroll tools).
The patching concept
Rather than forking windows-mcp, precision-desktop describes patch intents — what should change in the companion MCP and why. The AI agent reads these intents and applies version-appropriate patches itself.
Current patch intents:
dpi_awareness— Add an optionalcoordinate_systemparameter to Click-Tool and Move-Tool that auto-converts logical coordinates to physicalfind_and_click— Add an optionalelement_nameparameter to Click-Tool that finds and clicks an element by name (no coordinates needed)
Use patch_status to check which patches are applied.
Environment variable
If windows-mcp is installed in a non-standard location, set the WINDOWS_MCP_PATH environment variable:
WINDOWS_MCP_PATH=C:\path\to\windows-mcp
By default, it looks in the standard Claude Extensions directory (%APPDATA%\Claude\Claude Extensions\ant.dir.cursortouch.windows-mcp).
Architecture
precision-desktop/
├── server.py # MCP server entry point — tool definitions and routing
├── calibration.py # DPI calibration: compute, persist, convert coordinates
├── find_element.py # Windows UI Automation: find elements, list elements, find windows
├── health_check.py # Environment checks: calibration, UI Automation, companion MCP
├── patches/
│ └── windows_mcp.py # LLM-adaptive patch intents for windows-mcp
├── state/
│ └── calibration.json # Persisted calibration data (user-specific, gitignored)
└── pyproject.toml
How calibration works
- User (or AI agent) provides 2+ points with both physical and logical coordinates
calibration.pycomputes the median scale factor for X and Y axes independently- Checks consistency — if points disagree by more than 2%, flags as inconsistent
- Computes offset (typically 0 for standard DPI scaling, non-zero for multi-monitor setups)
- Persists to
state/calibration.json - Subsequent
convert_coordinatescalls use the persisted factors
How UI Automation finding works
find_element.pyruns PowerShell scripts that loadUIAutomationClientandUIAutomationTypesassemblies- Searches the UI Automation tree for elements matching the requested name
- Returns bounding rectangles in the coordinate system that UI Automation reports (which is physical on DPI-aware processes)
- Results include center coordinates ready for direct use with Click-Tool/Move-Tool
Requirements
- Windows 10/11 (uses Windows UI Automation)
- Python 3.10+
- PowerShell (ships with Windows)
- mcp >= 1.0.0 (MCP SDK)
- MPos or similar coordinate reader (for calibration) — any tool that shows the cursor's screen position in both physical and logical coordinates. MPos is lightweight and portable (no install needed)
License
常见问题
io.github.ikoskela/precision-desktop 是什么?
Fixes DPI coordinate scaling for Windows desktop automation MCP servers.
相关 Skills
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
相关 MCP Server
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。