Optical Context MCP

效率与工作流

by chrboebel

将大型且 OCR 内容密集的 PDF 压缩为高密度打包图像,便于接入 agent workflows。

什么是 Optical Context MCP

将大型且 OCR 内容密集的 PDF 压缩为高密度打包图像,便于接入 agent workflows。

README

<!-- mcp-name: io.github.ChrBoebel/optical-context-mcp --> <p align="center"> <img src="./assets/optical-context-logo.png" alt="Optical Context MCP logo" width="680"> </p> <h1 align="center">Optical Context MCP</h1> <p align="center"> Compress OCR-heavy PDFs into dense packed images so agents can work with long visual documents. </p> <p align="center"> <a href="https://pypi.org/project/optical-context-mcp/"><img src="https://img.shields.io/pypi/v/optical-context-mcp.svg" alt="PyPI version"></a> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11%2B-blue.svg" alt="Python 3.11+"></a> <a href="https://gofastmcp.com/"><img src="https://img.shields.io/badge/MCP-FastMCP-111111.svg" alt="FastMCP"></a> <a href="https://github.com/ChrBoebel/optical-context-mcp/actions/workflows/ci.yml"><img src="https://github.com/ChrBoebel/optical-context-mcp/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="MIT License"></a> </p>

Optical Context MCP is built for one specific job: turning large, visually structured PDFs into a smaller set of retrievable packed images for agent workflows.

It reads a local PDF, runs OCR with Mistral, recomposes the extracted text and figures into dense PNGs, and exposes those artifacts over MCP for batch retrieval.

What It Does

  • reads a local PDF from the MCP host machine
  • extracts page markdown and embedded images with Mistral OCR
  • packs that content into dense PNGs that preserve visual grouping
  • optionally sizes embedded figures with a bundled technical-document model
  • stores a manifest and temp job artifacts for follow-up retrieval
  • lets an agent pull only the packed images it needs

Where It Fits

Use it for:

  • operating manuals
  • scanned handbooks
  • product catalogs
  • PDF slide decks
  • visually structured OCR-heavy documents

Skip it for:

  • tiny PDFs
  • clean text-native PDFs where normal extraction is enough
  • workflows that require exact page-faithful rendering
  • cases where OCR cost is not justified

Example Result

The image below shows a real local validation run on a public research paper with dense text, figures, charts, and page-level visual structure. The packed image on the right consolidates the seven source pages shown on the left.

<p align="center"> <img src="./assets/original-vs-packed-comparison-straight-arrow.png" alt="Side-by-side comparison of original pages and the generated packed output" width="980"> </p>

Example local run facts from the generated manifest:

  • source paper pages: 22
  • previewed source page range: 15 to 21
  • extracted images: 30
  • packed output images: 6
  • example packed image size: 986x1084
  • example packed image file size: 536,697 bytes

This example shows the intended workflow: take a long, visually structured PDF and compress it into a smaller set of retrievable packed images that still preserve the visual structure of the source.

Install

bash
python -m pip install optical-context-mcp

Install with the adaptive sizing runtime:

bash
python -m pip install "optical-context-mcp[ml]"

Run without installing:

bash
uvx optical-context-mcp
  • MISTRAL_API_KEY is required for compress_pdf
  • packed images are always stored locally under the system temp directory
  • compress_pdf returns up to 30 packed images inline by default
  • the adaptive sizing checkpoint is bundled with the package
  • adaptive sizing activates automatically when torch and torchvision are available
  • set OPTICAL_CONTEXT_DISABLE_ADAPTIVE_SIZING=1 to force the legacy fixed sizing
  • set OPTICAL_CONTEXT_ADAPTIVE_MODEL_PATH=/path/to/model.pt to override the bundled checkpoint

For pinned shared setups:

bash
uvx --from optical-context-mcp==0.1.4 optical-context-mcp

Run

Default transport is stdio:

bash
optical-context-mcp

Claude Code

Register the server in a project:

bash
claude mcp add -s project optical-context -- uvx optical-context-mcp

Typical use:

  1. call compress_pdf
  2. inspect the returned manifest
  3. fetch packed images with get_packed_images

MCP Tools

  • compress_pdf: run OCR plus recomposition and create a stored job
  • get_job_manifest: load metadata for an existing job
  • get_packed_images: fetch one or more packed PNGs from an existing job

How It Works

mermaid
flowchart LR
    A["Local PDF"] --> B["Mistral OCR"]
    B --> C["Page markdown + embedded images"]
    C --> D["Recomposition engine"]
    D --> E["Dense packed PNG images"]
    E --> F["Stored job artifacts"]
    F --> G["Agent fetches manifest or image batches over MCP"]

Why Packed Images Instead Of Just OCR Text

  • section grouping
  • table-like layout
  • captions near figures
  • visual adjacency between text and embedded graphics

For many vision-capable agents, that is a better intermediate format than a plain OCR dump.

Current Scope

  • depends on Mistral OCR
  • currently handles local file paths, not remote uploads
  • stores artifacts in the local system temp directory by default
  • optimized for compression and retrieval, not final polished markdown generation
  • quality depends on OCR quality and the visual density of the source document
  • adaptive sizing falls back safely to fixed medium image sizing when the ML runtime is absent

Roadmap

  • make the OCR layer provider-agnostic so different OCR backends can be swapped behind the same MCP workflow

Development

bash
uv venv --python /opt/homebrew/bin/python3.11 .venv
uv pip install --python .venv/bin/python -e ".[dev]"
.venv/bin/python -m pytest

常见问题

Optical Context MCP 是什么?

将大型且 OCR 内容密集的 PDF 压缩为高密度打包图像,便于接入 agent workflows。

相关 Skills

表格处理

by anthropics

Universal
热门

围绕 .xlsx、.xlsm、.csv、.tsv 做读写、修复、清洗、格式整理、公式计算与格式转换,适合修改现有表格、生成新报表或把杂乱数据整理成交付级电子表格。

做 Excel/CSV 相关任务很省心,能直接读写、修复、清洗和格式转换,尤其擅长把乱七八糟的表格整理成交付级文件。

效率与工作流
未扫描109.6k

PDF处理

by anthropics

Universal
热门

遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。

PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。

效率与工作流
未扫描109.6k

Word文档

by anthropics

Universal
热门

覆盖Word/.docx文档的创建、读取、编辑与重排,适合生成报告、备忘录、信函和模板,也能处理目录、页眉页脚、页码、图片替换、查找替换、修订批注及内容提取整理。

搞定 .docx 的创建、改写与精排版,目录、批量替换、批注修订和图片更新都能自动化,做正式文档尤其省心。

效率与工作流
未扫描109.6k

相关 MCP Server

文件系统

编辑精选

by Anthropic

热门

Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。

这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。

效率与工作流
82.9k

by wonderwhy-er

热门

Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。

这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。

效率与工作流
5.8k

EdgarTools

编辑精选

by dgunning

热门

EdgarTools 是无需 API 密钥即可解析 SEC EDGAR 财报的开源 Python 库。

这个工具解决了金融数据获取的痛点——直接让 AI 读取结构化财报,比如让 Claude 分析苹果的 10-K 文件。适合量化分析师或金融开发者快速构建数据管道。但注意,它依赖 SEC 网站稳定性,高峰期可能延迟。

效率与工作流
1.9k

评论