PageMap

AI 与智能体

by retio-ai

为 AI agents 提供结构化网页表示,显著压缩 HTML token 消耗,平均可减少约 97% 的页面解析成本。

什么是 PageMap

为 AI agents 提供结构化网页表示,显著压缩 HTML token 消耗,平均可减少约 97% 的页面解析成本。

README

<!-- mcp-name: io.github.Retio-ai/pagemap -->

PageMap

PageMap converts raw HTML (100K+ tokens) into structured, AI-readable page maps (2-5K tokens) — a 97% token reduction. It works as an MCP server, Python SDK, and CLI, supporting 16 page types and 30+ e-commerce sites. Agents can read, click, type, and navigate any web page.

"Give your agent eyes and hands on the web."

CI PyPI Python License: AGPL-3.0 Docker Awesome MCP Servers


<!-- ============================================================ --> <!-- HUMAN GUIDE --> <!-- ============================================================ -->

Why PageMap?

Playwright MCP dumps 50-540KB accessibility snapshots per page, overflowing context windows after 2-3 navigations. Firecrawl and Jina convert HTML to markdown — read-only, no interaction.

PageMap gives your agent a compressed, actionable view of any web page:

PageMapPlaywright MCPFirecrawlJina Reader
Tokens / page2-5K6-50K10-50K10-50K
Interactionclick / type / select / hoverRaw tree parsingRead-onlyRead-only
Multi-page sessionsUnlimitedBreaks at 2-3 pagesN/AN/A
Task success (94 tasks)84.7%61.5%64.5%57.8%
Avg tokens / task2,71013,73713,88811,424
Cost / 94 tasks$1.06$4.09$3.98$2.26

Benchmarked across 11 e-commerce sites, 94 static tasks, 7 conditions. 8,100+ tests passing.


Quick Start

Chromium is auto-installed on first use — no manual playwright install needed.

Install

bash
pip install retio-pagemap

MCP Client Config

Add to Claude Code, Cursor, Windsurf, or Claude Desktop:

json
{
  "mcpServers": {
    "pagemap": {
      "command": "uvx",
      "args": ["retio-pagemap"]
    }
  }
}

Claude Desktop (macOS): Use the absolute path to uvx — run which uvx (e.g. /opt/homebrew/bin/uvx).

VS Code (Copilot): Use "servers" instead of "mcpServers" in .vscode/mcp.json.

Docker

bash
docker run -p 8000:8000 retio1001/pagemap --transport http

Features

13 MCP Tools — Read + Interact

Not just reading — your agent can click buttons, fill forms, select options, manage tabs, and navigate across pages. 13 tools cover the full browsing workflow:

get_page_map · execute_action · fill_form · scroll_page · wait_for · take_screenshot · get_page_state · navigate_back · batch_get_page_map · open_tab · switch_tab · list_tabs · close_tab

16 Page Types, Auto-Detected

PageMap automatically classifies pages and applies optimized extraction for each type:

product_detail · listing · search_results · article · news · video · login · form · checkout · dashboard · help_faq · settings · error · documentation · landing · blocked

E-Commerce Deep Coverage

Built-in support for 30+ major e-commerce sites across 4 tiers:

  • Global mega-platforms — Amazon, eBay, AliExpress, SHEIN, Walmart, Rakuten
  • Global fashion — Zara, H&M, Nike, Uniqlo, ASOS, Zalando, SSENSE, Farfetch, COS
  • Korea — Coupang, Naver Shopping, Musinsa, 29CM, W Concept, SSG, 11st
  • Japan/China — ZOZO, Tmall, JD.com, Taobao

Structured extraction of prices, options (size/color), ratings, availability — with automatic cookie consent handling and login barrier detection.

Smart Recovery

PageMap detects problems and tells your agent what to do:

  • Barrier detection — Login required? Bot blocked? Out of stock? Age verification? Popup overlay? PageMap adds a barrier field with the diagnosis and suggested next steps
  • Cookie consent auto-dismiss — 7 CMP providers auto-detected (Cookiebot, OneTrust, TrustArc, Didomi, Quantcast, Usercentrics, generic fallback). 5-tier dismiss cascade: CMP JS API → Reject → Accept → Dismiss → Close symbol. GDPR reject-first default policy
  • Popup overlay detection — AX tree role="dialog" + HTML regex 2-phase detection. Promotional popups (newsletter, exit-intent) auto-dismissed
  • Bot detection awareness — Detects Cloudflare, Turnstile, reCAPTCHA, hCaptcha, and Akamai. Reports the provider and suggests wait/retry strategies
  • Stale ref recovery — When DOM changes invalidate refs, PageMap returns clear guidance to re-fetch

Content Intelligence

  • 8 JSON-LD schemas — Product, NewsArticle, VideoObject, FAQPage, Event, LocalBusiness, BreadcrumbList, and ItemList
  • Metadata extraction — Prices, ratings, reviews, descriptions, images from structured data and DOM fallbacks
  • 2-layer caching — Cache hit (~10ms), content refresh (~500ms), full rebuild (~1.5s). Diff-based updates for unchanged sections
  • Delta evidence packet output - Optional to_delta_packet() serializer emits digest-bound evidence units, claim candidates, provenance, and authority flags for downstream memory/review systems without changing the default MCP output

10 Languages

Locale auto-detected from URL. Token budgets adjusted for CJK scripts.

LanguageLocaleLanguageLocale
EnglishenChinesezh
KoreankoSpanishes
JapanesejaItalianit
FrenchfrPortuguesept
GermandeDutchnl

Deployment

Local (STDIO)

Default mode. Runs as a local MCP server — no server setup needed.

bash
retio-pagemap

Docker

bash
docker run -p 8000:8000 retio1001/pagemap --transport http

Multi-architecture images (amd64/arm64) available on Docker Hub and GitHub Container Registry.


Python API

python
import asyncio
from pagemap.browser_session import BrowserSession
from pagemap.delta_serializer import to_delta_packet
from pagemap.page_map_builder import build_page_map_live
from pagemap.serializer import to_agent_prompt, to_json

async def main():
    async with BrowserSession() as session:
        page_map = await build_page_map_live(session, "https://example.com/product/123")
        print(to_agent_prompt(page_map))   # Agent-optimized text format
        print(to_json(page_map))           # Structured JSON
        print(to_delta_packet(page_map))   # Digest-bound evidence packet
        print(page_map.page_type)          # "product_detail"
        print(page_map.interactables)      # [Interactable(ref=1, role="button", ...)]
        print(page_map.metadata)           # {"name": "...", "price": "..."}

asyncio.run(main())

For offline processing (no browser):

python
from pagemap.page_map_builder import build_page_map_offline

page_map = build_page_map_offline(open("page.html").read(), url="https://example.com/product/123")

Security

PageMap treats all web content as untrusted input:

  • SSRF defense — Multi-layer protection against server-side request forgery
  • Prompt injection defense — Content boundaries, role-prefix stripping, suspicious content flagging
  • robots.txt compliance — RFC 9309 compliant. --ignore-robots opt-out flag
  • Resource guards — DOM node limit, HTML size limit, response size limit
  • Session isolation — Each session has independent cookies and storage, automatically cleaned up

Local development: Private IPs are blocked by default. Use --allow-local or PAGEMAP_ALLOW_LOCAL=1.

Disclaimer

Users are responsible for complying with the terms of service of target websites and all applicable laws when using PageMap.


Troubleshooting

"spawn uvx ENOENT" (Claude Desktop on macOS) — Claude Desktop does not inherit your shell PATH. Run which uvx and use the absolute path in your config.

First page takes a long time — Chromium cold start takes ~10-30s on first navigation. Subsequent pages load in 1-3 seconds.

Localhost blocked — Use --allow-local flag or set PAGEMAP_ALLOW_LOCAL=1.

Chromium not found — Run pip install retio-pagemap && playwright install chromium to install manually.


Requirements

  • Python 3.11+
  • Chromium (auto-installed on first use)

Community

Have a question or idea? Join the conversation in GitHub Discussions.

Development

Open in GitHub Codespaces

bash
git clone https://github.com/Retio-ai/Retio-pagemap.git
cd Retio-pagemap
uv sync --group dev
playwright install chromium
uv run pytest --tb=short -q

Pricing

Local (STDIO) — Free forever. Self-hosted, open source under AGPL-3.0.

Cloud API — Hosted multi-tenant server with auth, rate limiting, and credit-based billing. Contact retio1001@retio.ai for access.

License

AGPL-3.0-only — see LICENSE for the full text.

For commercial licensing options, contact retio1001@retio.ai.


<!-- ============================================================ --> <!-- AGENT REFERENCE --> <!-- ============================================================ -->

For Agents

This section is written for AI agents using PageMap as an MCP tool.

Tools

ToolWhen to use
get_page_mapStart here. Navigate to a URL and get a full structured map with numbered refs.
execute_actionClick, type, select, or hover using a ref number from the last get_page_map.
fill_formFill multiple form fields in one call. More efficient than sequential execute_action calls.
get_page_stateCheck current URL and title without a full rebuild. Use after actions that may navigate.
scroll_pageScroll to reveal lazy-loaded content before calling get_page_map again.
wait_forWait for dynamic content to appear (e.g. after a search or form submit).
take_screenshotCapture the visual state when the PageMap alone is ambiguous.
navigate_backGo back one step in browser history.
open_tabOpen a new browser tab and navigate to a URL.
switch_tabSwitch to a different open tab by index.
list_tabsList all open tabs with their URLs and titles.
close_tabClose a tab by index.
batch_get_page_mapFetch multiple URLs in parallel. Use for comparison tasks.

Output Format

yaml
URL: https://example.com/product/123
Title: Product Name
Type: product_detail          # auto-detected page type

## Actions
[1] button: Add to cart (click)
[2] select: Size (select)  options: S, M, L, XL
[3] link: See all reviews (click)
...

## Info
Price: $49.99
Rating: 4.5 / 5 (128 reviews)
Description: ...

## Images
  [1] https://cdn.example.com/product.jpg

## Meta
Tokens: ~1,800 | Interactables: 24 | Generation: 380ms
  • ## Actions — Every interactive element on the page with a stable ref number.
  • ## Info — Key page content extracted from HTML: prices, titles, ratings, descriptions.
  • ## Images — Product/content image URLs.
  • ## Meta — Token count, interactable count, generation time.

Barrier Detection

When PageMap encounters a page-level obstacle, it includes a barrier field in the response:

yaml
State:
  barrier: login_required
  barrier_hint: "Login form detected with email + password fields. Use fill_form to authenticate."

Possible barriers: cookie_consent, login_required, bot_blocked, out_of_stock, empty_results, error_page, age_verification, region_restricted, popup_overlay.

When you see a barrier: follow the barrier_hint guidance. For bot_blocked, wait and retry. For login_required, use fill_form with credentials.

Ref Lifecycle

Refs are assigned by get_page_map and remain valid until the page state changes.

Refs are invalidated when:

  • The page navigates to a new URL
  • A DOM mutation occurs (modal opens, SPA navigation, accordion toggles)
  • execute_action causes a page-level change

When you get a stale ref error: call get_page_map again to get fresh refs before retrying.

Token Budget Behavior

When a page exceeds the token budget, content is pruned in this order:

  1. Navigation menus, footers, sidebars removed first
  2. Secondary body content trimmed
  3. ## Actions and ## Info are always preserved

If key content seems missing, try scroll_page to load lazy content, then get_page_map again.

Recommended Workflow

code
1. get_page_map(url)          → read Actions + Info, pick refs
2. execute_action(ref, ...)   → interact
3. get_page_state()           → confirm navigation occurred
4. get_page_map(new_url)      → get fresh refs for next step

For pages with dynamic content (search results, filters):

code
1. get_page_map(url)
2. execute_action(ref, "click")    → trigger search/filter
3. wait_for(text="results")        → wait for content
4. get_page_map(url)               → get updated map

Known Limitations

  • Login-gated pages — PageMap does not manage sessions or cookies. Authentication must be handled externally.
  • Heavy bot detection (Cloudflare, Akamai) — May block automated access. PageMap detects the provider and suggests strategies, but cannot bypass active bot mitigation.
  • Private network access — Blocked by default. Requires --allow-local flag.
  • iframes — Cross-origin iframes are not accessible due to browser security policies.

PageMap — Structured Web Intelligence for the Agent Era.

常见问题

PageMap 是什么?

为 AI agents 提供结构化网页表示,显著压缩 HTML token 消耗,平均可减少约 97% 的页面解析成本。

相关 Skills

Claude接口

by anthropics

Universal
热门

面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。

想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心

AI 与智能体
未扫描151.3k

RAG架构师

by alirezarezvani

Universal
热门

聚焦生产级RAG系统设计与优化,覆盖文档切块、检索链路、索引构建、召回评估等关键环节,适合搭建可扩展、高准确率的知识库问答与检索增强应用。

面向RAG落地,把知识库、向量检索和生成链路系统串联起来,做架构设计时更清晰,也更少踩坑。

AI 与智能体
未扫描18.2k

多智能体架构

by alirezarezvani

Universal
热门

聚焦多智能体系统架构设计,梳理 Supervisor、Swarm、分层和 Pipeline 等模式,覆盖角色定义、通信协作与性能评估,适合规划稳健可扩展的 AI agent 编排方案。

帮你系统解决多智能体应用的架构设计与协同编排难题,适合构建复杂 AI 工作流,成熟度高、社区认可也很亮眼。

AI 与智能体
未扫描18.2k

相关 MCP Server

知识图谱记忆

编辑精选

by Anthropic

热门

Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。

帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。

AI 与智能体
87.3k

顺序思维

编辑精选

by Anthropic

热门

Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。

这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。

AI 与智能体
87.3k

PraisonAI

编辑精选

by mervinpraison

热门

PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。

如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。

AI 与智能体
8.1k

评论