什么是 PageMap?
为 AI agents 提供结构化网页表示,显著压缩 HTML token 消耗,平均可减少约 97% 的页面解析成本。
README
PageMap
PageMap converts raw HTML (100K+ tokens) into structured, AI-readable page maps (2-5K tokens) — a 97% token reduction. It works as an MCP server, Python SDK, and CLI, supporting 16 page types and 30+ e-commerce sites. Agents can read, click, type, and navigate any web page.
"Give your agent eyes and hands on the web."
<!-- ============================================================ --> <!-- HUMAN GUIDE --> <!-- ============================================================ -->
Why PageMap?
Playwright MCP dumps 50-540KB accessibility snapshots per page, overflowing context windows after 2-3 navigations. Firecrawl and Jina convert HTML to markdown — read-only, no interaction.
PageMap gives your agent a compressed, actionable view of any web page:
| PageMap | Playwright MCP | Firecrawl | Jina Reader | |
|---|---|---|---|---|
| Tokens / page | 2-5K | 6-50K | 10-50K | 10-50K |
| Interaction | click / type / select / hover | Raw tree parsing | Read-only | Read-only |
| Multi-page sessions | Unlimited | Breaks at 2-3 pages | N/A | N/A |
| Task success (94 tasks) | 84.7% | 61.5% | 64.5% | 57.8% |
| Avg tokens / task | 2,710 | 13,737 | 13,888 | 11,424 |
| Cost / 94 tasks | $1.06 | $4.09 | $3.98 | $2.26 |
Benchmarked across 11 e-commerce sites, 94 static tasks, 7 conditions. 8,100+ tests passing.
Quick Start
Chromium is auto-installed on first use — no manual playwright install needed.
Install
pip install retio-pagemap
MCP Client Config
Add to Claude Code, Cursor, Windsurf, or Claude Desktop:
{
"mcpServers": {
"pagemap": {
"command": "uvx",
"args": ["retio-pagemap"]
}
}
}
Claude Desktop (macOS): Use the absolute path to
uvx— runwhich uvx(e.g./opt/homebrew/bin/uvx).
VS Code (Copilot): Use
"servers"instead of"mcpServers"in.vscode/mcp.json.
Docker
docker run -p 8000:8000 retio1001/pagemap --transport http
Features
13 MCP Tools — Read + Interact
Not just reading — your agent can click buttons, fill forms, select options, manage tabs, and navigate across pages. 13 tools cover the full browsing workflow:
get_page_map · execute_action · fill_form · scroll_page · wait_for · take_screenshot · get_page_state · navigate_back · batch_get_page_map · open_tab · switch_tab · list_tabs · close_tab
16 Page Types, Auto-Detected
PageMap automatically classifies pages and applies optimized extraction for each type:
product_detail · listing · search_results · article · news · video · login · form · checkout · dashboard · help_faq · settings · error · documentation · landing · blocked
E-Commerce Deep Coverage
Built-in support for 30+ major e-commerce sites across 4 tiers:
- Global mega-platforms — Amazon, eBay, AliExpress, SHEIN, Walmart, Rakuten
- Global fashion — Zara, H&M, Nike, Uniqlo, ASOS, Zalando, SSENSE, Farfetch, COS
- Korea — Coupang, Naver Shopping, Musinsa, 29CM, W Concept, SSG, 11st
- Japan/China — ZOZO, Tmall, JD.com, Taobao
Structured extraction of prices, options (size/color), ratings, availability — with automatic cookie consent handling and login barrier detection.
Smart Recovery
PageMap detects problems and tells your agent what to do:
- Barrier detection — Login required? Bot blocked? Out of stock? Age verification? Popup overlay? PageMap adds a
barrierfield with the diagnosis and suggested next steps - Cookie consent auto-dismiss — 7 CMP providers auto-detected (Cookiebot, OneTrust, TrustArc, Didomi, Quantcast, Usercentrics, generic fallback). 5-tier dismiss cascade: CMP JS API → Reject → Accept → Dismiss → Close symbol. GDPR reject-first default policy
- Popup overlay detection — AX tree
role="dialog"+ HTML regex 2-phase detection. Promotional popups (newsletter, exit-intent) auto-dismissed - Bot detection awareness — Detects Cloudflare, Turnstile, reCAPTCHA, hCaptcha, and Akamai. Reports the provider and suggests wait/retry strategies
- Stale ref recovery — When DOM changes invalidate refs, PageMap returns clear guidance to re-fetch
Content Intelligence
- 8 JSON-LD schemas — Product, NewsArticle, VideoObject, FAQPage, Event, LocalBusiness, BreadcrumbList, and ItemList
- Metadata extraction — Prices, ratings, reviews, descriptions, images from structured data and DOM fallbacks
- 2-layer caching — Cache hit (~10ms), content refresh (~500ms), full rebuild (~1.5s). Diff-based updates for unchanged sections
- Delta evidence packet output - Optional
to_delta_packet()serializer emits digest-bound evidence units, claim candidates, provenance, and authority flags for downstream memory/review systems without changing the default MCP output
10 Languages
Locale auto-detected from URL. Token budgets adjusted for CJK scripts.
| Language | Locale | Language | Locale |
|---|---|---|---|
| English | en | Chinese | zh |
| Korean | ko | Spanish | es |
| Japanese | ja | Italian | it |
| French | fr | Portuguese | pt |
| German | de | Dutch | nl |
Deployment
Local (STDIO)
Default mode. Runs as a local MCP server — no server setup needed.
retio-pagemap
Docker
docker run -p 8000:8000 retio1001/pagemap --transport http
Multi-architecture images (amd64/arm64) available on Docker Hub and GitHub Container Registry.
Python API
import asyncio
from pagemap.browser_session import BrowserSession
from pagemap.delta_serializer import to_delta_packet
from pagemap.page_map_builder import build_page_map_live
from pagemap.serializer import to_agent_prompt, to_json
async def main():
async with BrowserSession() as session:
page_map = await build_page_map_live(session, "https://example.com/product/123")
print(to_agent_prompt(page_map)) # Agent-optimized text format
print(to_json(page_map)) # Structured JSON
print(to_delta_packet(page_map)) # Digest-bound evidence packet
print(page_map.page_type) # "product_detail"
print(page_map.interactables) # [Interactable(ref=1, role="button", ...)]
print(page_map.metadata) # {"name": "...", "price": "..."}
asyncio.run(main())
For offline processing (no browser):
from pagemap.page_map_builder import build_page_map_offline
page_map = build_page_map_offline(open("page.html").read(), url="https://example.com/product/123")
Security
PageMap treats all web content as untrusted input:
- SSRF defense — Multi-layer protection against server-side request forgery
- Prompt injection defense — Content boundaries, role-prefix stripping, suspicious content flagging
- robots.txt compliance — RFC 9309 compliant.
--ignore-robotsopt-out flag - Resource guards — DOM node limit, HTML size limit, response size limit
- Session isolation — Each session has independent cookies and storage, automatically cleaned up
Local development: Private IPs are blocked by default. Use --allow-local or PAGEMAP_ALLOW_LOCAL=1.
Disclaimer
Users are responsible for complying with the terms of service of target websites and all applicable laws when using PageMap.
Troubleshooting
"spawn uvx ENOENT" (Claude Desktop on macOS) — Claude Desktop does not inherit your shell PATH. Run which uvx and use the absolute path in your config.
First page takes a long time — Chromium cold start takes ~10-30s on first navigation. Subsequent pages load in 1-3 seconds.
Localhost blocked — Use --allow-local flag or set PAGEMAP_ALLOW_LOCAL=1.
Chromium not found — Run pip install retio-pagemap && playwright install chromium to install manually.
Requirements
- Python 3.11+
- Chromium (auto-installed on first use)
Community
Have a question or idea? Join the conversation in GitHub Discussions.
Development
git clone https://github.com/Retio-ai/Retio-pagemap.git
cd Retio-pagemap
uv sync --group dev
playwright install chromium
uv run pytest --tb=short -q
Pricing
Local (STDIO) — Free forever. Self-hosted, open source under AGPL-3.0.
Cloud API — Hosted multi-tenant server with auth, rate limiting, and credit-based billing. Contact retio1001@retio.ai for access.
License
AGPL-3.0-only — see LICENSE for the full text.
For commercial licensing options, contact retio1001@retio.ai.
<!-- ============================================================ --> <!-- AGENT REFERENCE --> <!-- ============================================================ -->
For Agents
This section is written for AI agents using PageMap as an MCP tool.
Tools
| Tool | When to use |
|---|---|
get_page_map | Start here. Navigate to a URL and get a full structured map with numbered refs. |
execute_action | Click, type, select, or hover using a ref number from the last get_page_map. |
fill_form | Fill multiple form fields in one call. More efficient than sequential execute_action calls. |
get_page_state | Check current URL and title without a full rebuild. Use after actions that may navigate. |
scroll_page | Scroll to reveal lazy-loaded content before calling get_page_map again. |
wait_for | Wait for dynamic content to appear (e.g. after a search or form submit). |
take_screenshot | Capture the visual state when the PageMap alone is ambiguous. |
navigate_back | Go back one step in browser history. |
open_tab | Open a new browser tab and navigate to a URL. |
switch_tab | Switch to a different open tab by index. |
list_tabs | List all open tabs with their URLs and titles. |
close_tab | Close a tab by index. |
batch_get_page_map | Fetch multiple URLs in parallel. Use for comparison tasks. |
Output Format
URL: https://example.com/product/123
Title: Product Name
Type: product_detail # auto-detected page type
## Actions
[1] button: Add to cart (click)
[2] select: Size (select) — options: S, M, L, XL
[3] link: See all reviews (click)
...
## Info
Price: $49.99
Rating: 4.5 / 5 (128 reviews)
Description: ...
## Images
[1] https://cdn.example.com/product.jpg
## Meta
Tokens: ~1,800 | Interactables: 24 | Generation: 380ms
## Actions— Every interactive element on the page with a stablerefnumber.## Info— Key page content extracted from HTML: prices, titles, ratings, descriptions.## Images— Product/content image URLs.## Meta— Token count, interactable count, generation time.
Barrier Detection
When PageMap encounters a page-level obstacle, it includes a barrier field in the response:
State:
barrier: login_required
barrier_hint: "Login form detected with email + password fields. Use fill_form to authenticate."
Possible barriers: cookie_consent, login_required, bot_blocked, out_of_stock, empty_results, error_page, age_verification, region_restricted, popup_overlay.
When you see a barrier: follow the barrier_hint guidance. For bot_blocked, wait and retry. For login_required, use fill_form with credentials.
Ref Lifecycle
Refs are assigned by get_page_map and remain valid until the page state changes.
Refs are invalidated when:
- The page navigates to a new URL
- A DOM mutation occurs (modal opens, SPA navigation, accordion toggles)
execute_actioncauses a page-level change
When you get a stale ref error: call get_page_map again to get fresh refs before retrying.
Token Budget Behavior
When a page exceeds the token budget, content is pruned in this order:
- Navigation menus, footers, sidebars removed first
- Secondary body content trimmed
## Actionsand## Infoare always preserved
If key content seems missing, try scroll_page to load lazy content, then get_page_map again.
Recommended Workflow
1. get_page_map(url) → read Actions + Info, pick refs
2. execute_action(ref, ...) → interact
3. get_page_state() → confirm navigation occurred
4. get_page_map(new_url) → get fresh refs for next step
For pages with dynamic content (search results, filters):
1. get_page_map(url)
2. execute_action(ref, "click") → trigger search/filter
3. wait_for(text="results") → wait for content
4. get_page_map(url) → get updated map
Known Limitations
- Login-gated pages — PageMap does not manage sessions or cookies. Authentication must be handled externally.
- Heavy bot detection (Cloudflare, Akamai) — May block automated access. PageMap detects the provider and suggests strategies, but cannot bypass active bot mitigation.
- Private network access — Blocked by default. Requires
--allow-localflag. - iframes — Cross-origin iframes are not accessible due to browser security policies.
PageMap — Structured Web Intelligence for the Agent Era.
常见问题
PageMap 是什么?
为 AI agents 提供结构化网页表示,显著压缩 HTML token 消耗,平均可减少约 97% 的页面解析成本。
相关 Skills
Claude接口
by anthropics
面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。
✎ 想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心
RAG架构师
by alirezarezvani
聚焦生产级RAG系统设计与优化,覆盖文档切块、检索链路、索引构建、召回评估等关键环节,适合搭建可扩展、高准确率的知识库问答与检索增强应用。
✎ 面向RAG落地,把知识库、向量检索和生成链路系统串联起来,做架构设计时更清晰,也更少踩坑。
多智能体架构
by alirezarezvani
聚焦多智能体系统架构设计,梳理 Supervisor、Swarm、分层和 Pipeline 等模式,覆盖角色定义、通信协作与性能评估,适合规划稳健可扩展的 AI agent 编排方案。
✎ 帮你系统解决多智能体应用的架构设计与协同编排难题,适合构建复杂 AI 工作流,成熟度高、社区认可也很亮眼。
相关 MCP Server
知识图谱记忆
编辑精选by Anthropic
Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。
✎ 帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。
顺序思维
编辑精选by Anthropic
Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。
✎ 这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。
PraisonAI
编辑精选by mervinpraison
PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。
✎ 如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。