什么是 Optical Context MCP?
将大型且 OCR 内容密集的 PDF 压缩为高密度打包图像,便于接入 agent workflows。
README
Optical Context MCP is built for one specific job: turning large, visually structured PDFs into a smaller set of retrievable packed images for agent workflows.
It reads a local PDF, runs OCR with Mistral, recomposes the extracted text and figures into dense PNGs, and exposes those artifacts over MCP for batch retrieval.
What It Does
- reads a local PDF from the MCP host machine
- extracts page markdown and embedded images with Mistral OCR
- packs that content into dense PNGs that preserve visual grouping
- optionally sizes embedded figures with a bundled technical-document model
- stores a manifest and temp job artifacts for follow-up retrieval
- lets an agent pull only the packed images it needs
Where It Fits
Use it for:
- operating manuals
- scanned handbooks
- product catalogs
- PDF slide decks
- visually structured OCR-heavy documents
Skip it for:
- tiny PDFs
- clean text-native PDFs where normal extraction is enough
- workflows that require exact page-faithful rendering
- cases where OCR cost is not justified
Example Result
The image below shows a real local validation run on a public research paper with dense text, figures, charts, and page-level visual structure. The packed image on the right consolidates the seven source pages shown on the left.
<p align="center"> <img src="./assets/original-vs-packed-comparison-straight-arrow.png" alt="Side-by-side comparison of original pages and the generated packed output" width="980"> </p>Example local run facts from the generated manifest:
- source paper pages: 22
- previewed source page range: 15 to 21
- extracted images: 30
- packed output images: 6
- example packed image size:
986x1084 - example packed image file size:
536,697 bytes
This example shows the intended workflow: take a long, visually structured PDF and compress it into a smaller set of retrievable packed images that still preserve the visual structure of the source.
Install
python -m pip install optical-context-mcp
Install with the adaptive sizing runtime:
python -m pip install "optical-context-mcp[ml]"
Run without installing:
uvx optical-context-mcp
MISTRAL_API_KEYis required forcompress_pdf- packed images are always stored locally under the system temp directory
compress_pdfreturns up to30packed images inline by default- the adaptive sizing checkpoint is bundled with the package
- adaptive sizing activates automatically when
torchandtorchvisionare available - set
OPTICAL_CONTEXT_DISABLE_ADAPTIVE_SIZING=1to force the legacy fixed sizing - set
OPTICAL_CONTEXT_ADAPTIVE_MODEL_PATH=/path/to/model.ptto override the bundled checkpoint
For pinned shared setups:
uvx --from optical-context-mcp==0.1.4 optical-context-mcp
Run
Default transport is stdio:
optical-context-mcp
Claude Code
Register the server in a project:
claude mcp add -s project optical-context -- uvx optical-context-mcp
Typical use:
- call
compress_pdf - inspect the returned manifest
- fetch packed images with
get_packed_images
MCP Tools
compress_pdf: run OCR plus recomposition and create a stored jobget_job_manifest: load metadata for an existing jobget_packed_images: fetch one or more packed PNGs from an existing job
How It Works
flowchart LR
A["Local PDF"] --> B["Mistral OCR"]
B --> C["Page markdown + embedded images"]
C --> D["Recomposition engine"]
D --> E["Dense packed PNG images"]
E --> F["Stored job artifacts"]
F --> G["Agent fetches manifest or image batches over MCP"]
Why Packed Images Instead Of Just OCR Text
- section grouping
- table-like layout
- captions near figures
- visual adjacency between text and embedded graphics
For many vision-capable agents, that is a better intermediate format than a plain OCR dump.
Current Scope
- depends on Mistral OCR
- currently handles local file paths, not remote uploads
- stores artifacts in the local system temp directory by default
- optimized for compression and retrieval, not final polished markdown generation
- quality depends on OCR quality and the visual density of the source document
- adaptive sizing falls back safely to fixed medium image sizing when the ML runtime is absent
Roadmap
- make the OCR layer provider-agnostic so different OCR backends can be swapped behind the same MCP workflow
Development
uv venv --python /opt/homebrew/bin/python3.11 .venv
uv pip install --python .venv/bin/python -e ".[dev]"
.venv/bin/python -m pytest
常见问题
Optical Context MCP 是什么?
将大型且 OCR 内容密集的 PDF 压缩为高密度打包图像,便于接入 agent workflows。
相关 Skills
技能工坊
by anthropics
覆盖 Skill 从创建到迭代优化全流程:起草能力、补测试提示、跑评测与基准方差分析,并持续改写内容和描述,提升效果与触发准确率。
✎ 技能工坊把技能从创建、迭代到评测串成闭环,方差分析加描述优化,特别适合把触发准确率打磨得更稳。
PPT处理
by anthropics
处理 .pptx 全流程:创建演示文稿、提取和解析幻灯片内容、批量修改现有文件,支持模板套用、合并拆分、备注评论与版式调整。
✎ 涉及PPTX的创建、解析、修改到合并拆分都能一站搞定,连备注、模板和评论也能处理,做演示文稿特别省心。
PDF处理
by anthropics
遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。
✎ PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。
相关 MCP Server
文件系统
编辑精选by Anthropic
Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。
✎ 这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。
by wonderwhy-er
Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。
✎ 这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。
by stickerdaniel
LinkedIn Profile and Job Scraper 是让 Claude 直接抓取 LinkedIn 个人资料、公司信息和职位详情的工具。
✎ 这个服务器解决了招聘和商业调研中手动复制粘贴 LinkedIn 数据的痛点,适合猎头或市场分析师快速获取候选人背景和公司动态。不过,LinkedIn 反爬机制频繁更新,数据稳定性需要持续维护,使用时建议搭配人工验证。