Mineru Document Parsing Server

效率与工作流

by demomagic

Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.

什么是 Mineru Document Parsing Server

Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.

README

Mineru MCP Server

A Model Context Protocol (MCP) document parsing server that integrates with Mineru API to provide powerful document parsing capabilities.

Features

  • Single File Parsing: Create document parsing tasks via URL
  • Batch File Parsing: Support multiple file batch upload and parsing
  • Task Status Monitoring: Real-time query of parsing progress and results
  • Multi-format Support: Support PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
  • OCR Functionality: Optional OCR text recognition
  • Formula Recognition: Support mathematical formula recognition
  • Table Recognition: Support table structure recognition
  • Multi-language Support: Support Chinese, English and other languages

Installation

bash
npm install

Configuration

Before using, you need to configure the Mineru API key:

typescript
const config = {
  mineruApiKey: "your-mineru-api-bearer-token", // Mineru API Bearer token
  mineruBaseUrl: "https://mineru.net/api/v4" // Mineru API base URL
};

Available Tools

1. create_parsing_task

Create a document parsing task for a single file

Parameters:

  • url (required): File URL

  • is_ocr (optional): Enable OCR, default false

  • enable_formula (optional): Enable formula recognition, default true

  • enable_table (optional): Enable table recognition, default true

  • language (optional): Document language, default "ch"

  • page_ranges (optional): Page ranges, e.g., "1-10,15-20"

  • model_version (optional): Model version, "v1" or "v2"

  • extra_formats (optional): Additional export formats, ["docx", "html", "latex"]

2. get_task_status

Query parsing task status

Parameters:

  • task_id (required): Task ID

3. create_batch_parsing_task

Create a batch file upload parsing task (for local file uploads)

Parameters:

  • files (required): File array, each file contains name, is_ocr, page_ranges and other properties
  • enable_formula (optional): Enable formula recognition
  • enable_table (optional): Enable table recognition
  • language (optional): Document language
  • model_version (optional): Model version
  • extra_formats (optional): Additional export formats

4. create_batch_url_parsing_task

Create a batch URL parsing task (for remote file URLs)

Parameters:

  • files (required): File array, each file contains url, is_ocr, page_ranges and other properties
  • enable_formula (optional): Enable formula recognition
  • enable_table (optional): Enable table recognition
  • language (optional): Document language
  • model_version (optional): Model version
  • extra_formats (optional): Additional export formats

5. get_batch_task_results

Query batch parsing task results (supports both URL batch parsing and local upload batch parsing)

Parameters:

  • batch_id (required): Batch task ID (from create_batch_url_parsing_task or create_batch_parsing_task)

Usage Examples

Single File Parsing

typescript
// Create parsing task
const taskResult = await create_parsing_task({
  url: "https://example.com/document.pdf",
  is_ocr: true,
  enable_formula: true,
  language: "en"
});

// Query task status
const status = await get_task_status({
  task_id: taskResult.task_id
});

Batch File Upload Parsing

typescript
// Create batch upload task
const batchResult = await create_batch_parsing_task({
  files: [
    { name: "document1.pdf", is_ocr: true },
    { name: "document2.docx" }
  ],
  enable_formula: true,
  language: "ch"
});

// Query batch task results (applicable to both batch parsing methods)
const batchStatus = await get_batch_task_results({
  batch_id: batchResult.batch_id
});

Batch URL Parsing

typescript
// Create batch URL parsing task
const batchUrlResult = await create_batch_url_parsing_task({
  files: [
    { url: "https://example.com/doc1.pdf", is_ocr: true },
    { url: "https://example.com/doc2.docx" }
  ],
  enable_formula: true,
  language: "en"
});

// Query batch task results (applicable to both batch parsing methods)
const batchUrlStatus = await get_batch_task_results({
  batch_id: batchUrlResult.batch_id
});

Development

bash
npm run dev

Important Notes

  1. Single file size cannot exceed 200MB, page count cannot exceed 600 pages
  2. Each account has 2000 pages of highest priority parsing quota per day
  3. Due to network restrictions, foreign URLs like GitHub and AWS may timeout
  4. Batch upload file links are valid for 24 hours
  5. No need to set Content-Type header when uploading files

Common Error Codes

Error CodeDescriptionSolution
A0202Token errorCheck if the Token is correct, or replace with a new Token
A0211Token expiredReplace with a new Token
-500Parameter errorEnsure parameter types and Content-Type are correct
-10001Service exceptionPlease try again later
-10002Request parameter errorCheck request parameter format
-60001Failed to generate upload URLPlease try again later
-60002Failed to get matching file formatFile type detection failed, ensure the requested filename and link have correct extensions, and the file is one of pdf, doc, docx, ppt, pptx, png, jp(e)g
-60003File read failedCheck if the file is corrupted and re-upload
-60004Empty filePlease upload a valid file
-60005File size exceeds limitCheck file size, maximum support 200MB
-60006File page count exceeds limitPlease split the file and try again
-60007Model service temporarily unavailablePlease try again later or contact technical support
-60008File read timeoutCheck if URL is accessible
-60009Task submission queue is fullPlease try again later
-60010Parsing failedPlease try again later
-60011Failed to get valid filePlease ensure the file has been uploaded
-60012Task not foundPlease ensure task_id is valid and not deleted
-60013No permission to access this taskCan only access tasks submitted by yourself
-60014Delete running taskRunning tasks do not support deletion
-60015File conversion failedCan manually convert to PDF and upload
-60016File conversion failedFile conversion to specified format failed, can try other format export or retry

License

ISC

常见问题

Mineru Document Parsing Server 是什么?

Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.

相关 Skills

表格处理

by anthropics

Universal
热门

围绕 .xlsx、.xlsm、.csv、.tsv 做读写、修复、清洗、格式整理、公式计算与格式转换,适合修改现有表格、生成新报表或把杂乱数据整理成交付级电子表格。

做 Excel/CSV 相关任务很省心,能直接读写、修复、清洗和格式转换,尤其擅长把乱七八糟的表格整理成交付级文件。

效率与工作流
未扫描109.6k

PDF处理

by anthropics

Universal
热门

遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。

PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。

效率与工作流
未扫描109.6k

Word文档

by anthropics

Universal
热门

覆盖Word/.docx文档的创建、读取、编辑与重排,适合生成报告、备忘录、信函和模板,也能处理目录、页眉页脚、页码、图片替换、查找替换、修订批注及内容提取整理。

搞定 .docx 的创建、改写与精排版,目录、批量替换、批注修订和图片更新都能自动化,做正式文档尤其省心。

效率与工作流
未扫描109.6k

相关 MCP Server

文件系统

编辑精选

by Anthropic

热门

Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。

这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。

效率与工作流
82.9k

by wonderwhy-er

热门

Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。

这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。

效率与工作流
5.8k

EdgarTools

编辑精选

by dgunning

热门

EdgarTools 是无需 API 密钥即可解析 SEC EDGAR 财报的开源 Python 库。

这个工具解决了金融数据获取的痛点——直接让 AI 读取结构化财报,比如让 Claude 分析苹果的 10-K 文件。适合量化分析师或金融开发者快速构建数据管道。但注意,它依赖 SEC 网站稳定性,高峰期可能延迟。

效率与工作流
1.9k

评论