Mineru Document Parsing Server

效率与工作流

by demomagic

Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.

什么是 Mineru Document Parsing Server

Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.

README

Mineru MCP Server

A Model Context Protocol (MCP) document parsing server that integrates with Mineru API to provide powerful document parsing capabilities.

Features

  • Single File Parsing: Create document parsing tasks via URL
  • Batch File Parsing: Support multiple file batch upload and parsing
  • Task Status Monitoring: Real-time query of parsing progress and results
  • Multi-format Support: Support PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
  • OCR Functionality: Optional OCR text recognition
  • Formula Recognition: Support mathematical formula recognition
  • Table Recognition: Support table structure recognition
  • Multi-language Support: Support Chinese, English and other languages

Installation

bash
npm install

Configuration

Before using, you need to configure the Mineru API key:

typescript
const config = {
  mineruApiKey: "your-mineru-api-bearer-token", // Mineru API Bearer token
  mineruBaseUrl: "https://mineru.net/api/v4" // Mineru API base URL
};

Available Tools

1. create_parsing_task

Create a document parsing task for a single file

Parameters:

  • url (required): File URL

  • is_ocr (optional): Enable OCR, default false

  • enable_formula (optional): Enable formula recognition, default true

  • enable_table (optional): Enable table recognition, default true

  • language (optional): Document language, default "ch"

  • page_ranges (optional): Page ranges, e.g., "1-10,15-20"

  • model_version (optional): Model version, "v1" or "v2"

  • extra_formats (optional): Additional export formats, ["docx", "html", "latex"]

2. get_task_status

Query parsing task status

Parameters:

  • task_id (required): Task ID

3. create_batch_parsing_task

Create a batch file upload parsing task (for local file uploads)

Parameters:

  • files (required): File array, each file contains name, is_ocr, page_ranges and other properties
  • enable_formula (optional): Enable formula recognition
  • enable_table (optional): Enable table recognition
  • language (optional): Document language
  • model_version (optional): Model version
  • extra_formats (optional): Additional export formats

4. create_batch_url_parsing_task

Create a batch URL parsing task (for remote file URLs)

Parameters:

  • files (required): File array, each file contains url, is_ocr, page_ranges and other properties
  • enable_formula (optional): Enable formula recognition
  • enable_table (optional): Enable table recognition
  • language (optional): Document language
  • model_version (optional): Model version
  • extra_formats (optional): Additional export formats

5. get_batch_task_results

Query batch parsing task results (supports both URL batch parsing and local upload batch parsing)

Parameters:

  • batch_id (required): Batch task ID (from create_batch_url_parsing_task or create_batch_parsing_task)

Usage Examples

Single File Parsing

typescript
// Create parsing task
const taskResult = await create_parsing_task({
  url: "https://example.com/document.pdf",
  is_ocr: true,
  enable_formula: true,
  language: "en"
});

// Query task status
const status = await get_task_status({
  task_id: taskResult.task_id
});

Batch File Upload Parsing

typescript
// Create batch upload task
const batchResult = await create_batch_parsing_task({
  files: [
    { name: "document1.pdf", is_ocr: true },
    { name: "document2.docx" }
  ],
  enable_formula: true,
  language: "ch"
});

// Query batch task results (applicable to both batch parsing methods)
const batchStatus = await get_batch_task_results({
  batch_id: batchResult.batch_id
});

Batch URL Parsing

typescript
// Create batch URL parsing task
const batchUrlResult = await create_batch_url_parsing_task({
  files: [
    { url: "https://example.com/doc1.pdf", is_ocr: true },
    { url: "https://example.com/doc2.docx" }
  ],
  enable_formula: true,
  language: "en"
});

// Query batch task results (applicable to both batch parsing methods)
const batchUrlStatus = await get_batch_task_results({
  batch_id: batchUrlResult.batch_id
});

Development

bash
npm run dev

Important Notes

  1. Single file size cannot exceed 200MB, page count cannot exceed 600 pages
  2. Each account has 2000 pages of highest priority parsing quota per day
  3. Due to network restrictions, foreign URLs like GitHub and AWS may timeout
  4. Batch upload file links are valid for 24 hours
  5. No need to set Content-Type header when uploading files

Common Error Codes

Error CodeDescriptionSolution
A0202Token errorCheck if the Token is correct, or replace with a new Token
A0211Token expiredReplace with a new Token
-500Parameter errorEnsure parameter types and Content-Type are correct
-10001Service exceptionPlease try again later
-10002Request parameter errorCheck request parameter format
-60001Failed to generate upload URLPlease try again later
-60002Failed to get matching file formatFile type detection failed, ensure the requested filename and link have correct extensions, and the file is one of pdf, doc, docx, ppt, pptx, png, jp(e)g
-60003File read failedCheck if the file is corrupted and re-upload
-60004Empty filePlease upload a valid file
-60005File size exceeds limitCheck file size, maximum support 200MB
-60006File page count exceeds limitPlease split the file and try again
-60007Model service temporarily unavailablePlease try again later or contact technical support
-60008File read timeoutCheck if URL is accessible
-60009Task submission queue is fullPlease try again later
-60010Parsing failedPlease try again later
-60011Failed to get valid filePlease ensure the file has been uploaded
-60012Task not foundPlease ensure task_id is valid and not deleted
-60013No permission to access this taskCan only access tasks submitted by yourself
-60014Delete running taskRunning tasks do not support deletion
-60015File conversion failedCan manually convert to PDF and upload
-60016File conversion failedFile conversion to specified format failed, can try other format export or retry

License

ISC

常见问题

Mineru Document Parsing Server 是什么?

Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.

相关 Skills

技能工坊

by anthropics

Universal
热门

覆盖 Skill 从创建到迭代优化全流程:起草能力、补测试提示、跑评测与基准方差分析,并持续改写内容和描述,提升效果与触发准确率。

技能工坊把技能从创建、迭代到评测串成闭环,方差分析加描述优化,特别适合把触发准确率打磨得更稳。

效率与工作流
未扫描150.9k

PPT处理

by anthropics

Universal
热门

处理 .pptx 全流程:创建演示文稿、提取和解析幻灯片内容、批量修改现有文件,支持模板套用、合并拆分、备注评论与版式调整。

涉及PPTX的创建、解析、修改到合并拆分都能一站搞定,连备注、模板和评论也能处理,做演示文稿特别省心。

效率与工作流
未扫描150.9k

PDF处理

by anthropics

Universal
热门

遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。

PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。

效率与工作流
未扫描150.9k

相关 MCP Server

文件系统

编辑精选

by Anthropic

热门

Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。

这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。

效率与工作流
87.2k

by wonderwhy-er

热门

Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。

这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。

效率与工作流
6.2k

by stickerdaniel

热门

LinkedIn Profile and Job Scraper 是让 Claude 直接抓取 LinkedIn 个人资料、公司信息和职位详情的工具。

这个服务器解决了招聘和商业调研中手动复制粘贴 LinkedIn 数据的痛点,适合猎头或市场分析师快速获取候选人背景和公司动态。不过,LinkedIn 反爬机制频繁更新,数据稳定性需要持续维护,使用时建议搭配人工验证。

效率与工作流
2.4k

评论