Mineru Document Parsing Server
效率与工作流by demomagic
Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.
什么是 Mineru Document Parsing Server?
Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.
README
Mineru MCP Server
A Model Context Protocol (MCP) document parsing server that integrates with Mineru API to provide powerful document parsing capabilities.
Features
- Single File Parsing: Create document parsing tasks via URL
- Batch File Parsing: Support multiple file batch upload and parsing
- Task Status Monitoring: Real-time query of parsing progress and results
- Multi-format Support: Support PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
- OCR Functionality: Optional OCR text recognition
- Formula Recognition: Support mathematical formula recognition
- Table Recognition: Support table structure recognition
- Multi-language Support: Support Chinese, English and other languages
Installation
npm install
Configuration
Before using, you need to configure the Mineru API key:
const config = {
mineruApiKey: "your-mineru-api-bearer-token", // Mineru API Bearer token
mineruBaseUrl: "https://mineru.net/api/v4" // Mineru API base URL
};
Available Tools
1. create_parsing_task
Create a document parsing task for a single file
Parameters:
-
url(required): File URL -
is_ocr(optional): Enable OCR, default false -
enable_formula(optional): Enable formula recognition, default true -
enable_table(optional): Enable table recognition, default true -
language(optional): Document language, default "ch" -
page_ranges(optional): Page ranges, e.g., "1-10,15-20" -
model_version(optional): Model version, "v1" or "v2" -
extra_formats(optional): Additional export formats, ["docx", "html", "latex"]
2. get_task_status
Query parsing task status
Parameters:
task_id(required): Task ID
3. create_batch_parsing_task
Create a batch file upload parsing task (for local file uploads)
Parameters:
files(required): File array, each file contains name, is_ocr, page_ranges and other propertiesenable_formula(optional): Enable formula recognitionenable_table(optional): Enable table recognitionlanguage(optional): Document languagemodel_version(optional): Model versionextra_formats(optional): Additional export formats
4. create_batch_url_parsing_task
Create a batch URL parsing task (for remote file URLs)
Parameters:
files(required): File array, each file contains url, is_ocr, page_ranges and other propertiesenable_formula(optional): Enable formula recognitionenable_table(optional): Enable table recognitionlanguage(optional): Document languagemodel_version(optional): Model versionextra_formats(optional): Additional export formats
5. get_batch_task_results
Query batch parsing task results (supports both URL batch parsing and local upload batch parsing)
Parameters:
batch_id(required): Batch task ID (from create_batch_url_parsing_task or create_batch_parsing_task)
Usage Examples
Single File Parsing
// Create parsing task
const taskResult = await create_parsing_task({
url: "https://example.com/document.pdf",
is_ocr: true,
enable_formula: true,
language: "en"
});
// Query task status
const status = await get_task_status({
task_id: taskResult.task_id
});
Batch File Upload Parsing
// Create batch upload task
const batchResult = await create_batch_parsing_task({
files: [
{ name: "document1.pdf", is_ocr: true },
{ name: "document2.docx" }
],
enable_formula: true,
language: "ch"
});
// Query batch task results (applicable to both batch parsing methods)
const batchStatus = await get_batch_task_results({
batch_id: batchResult.batch_id
});
Batch URL Parsing
// Create batch URL parsing task
const batchUrlResult = await create_batch_url_parsing_task({
files: [
{ url: "https://example.com/doc1.pdf", is_ocr: true },
{ url: "https://example.com/doc2.docx" }
],
enable_formula: true,
language: "en"
});
// Query batch task results (applicable to both batch parsing methods)
const batchUrlStatus = await get_batch_task_results({
batch_id: batchUrlResult.batch_id
});
Development
npm run dev
Important Notes
- Single file size cannot exceed 200MB, page count cannot exceed 600 pages
- Each account has 2000 pages of highest priority parsing quota per day
- Due to network restrictions, foreign URLs like GitHub and AWS may timeout
- Batch upload file links are valid for 24 hours
- No need to set Content-Type header when uploading files
Common Error Codes
| Error Code | Description | Solution |
|---|---|---|
| A0202 | Token error | Check if the Token is correct, or replace with a new Token |
| A0211 | Token expired | Replace with a new Token |
| -500 | Parameter error | Ensure parameter types and Content-Type are correct |
| -10001 | Service exception | Please try again later |
| -10002 | Request parameter error | Check request parameter format |
| -60001 | Failed to generate upload URL | Please try again later |
| -60002 | Failed to get matching file format | File type detection failed, ensure the requested filename and link have correct extensions, and the file is one of pdf, doc, docx, ppt, pptx, png, jp(e)g |
| -60003 | File read failed | Check if the file is corrupted and re-upload |
| -60004 | Empty file | Please upload a valid file |
| -60005 | File size exceeds limit | Check file size, maximum support 200MB |
| -60006 | File page count exceeds limit | Please split the file and try again |
| -60007 | Model service temporarily unavailable | Please try again later or contact technical support |
| -60008 | File read timeout | Check if URL is accessible |
| -60009 | Task submission queue is full | Please try again later |
| -60010 | Parsing failed | Please try again later |
| -60011 | Failed to get valid file | Please ensure the file has been uploaded |
| -60012 | Task not found | Please ensure task_id is valid and not deleted |
| -60013 | No permission to access this task | Can only access tasks submitted by yourself |
| -60014 | Delete running task | Running tasks do not support deletion |
| -60015 | File conversion failed | Can manually convert to PDF and upload |
| -60016 | File conversion failed | File conversion to specified format failed, can try other format export or retry |
License
ISC
常见问题
Mineru Document Parsing Server 是什么?
Provide powerful document parsing capabilities by integrating with the Mineru API. Enable single and batch file parsing with support for multiple formats, OCR, formula, and table recognition. Monitor parsing task status in real-time to efficiently process documents in various languages.
相关 Skills
表格处理
by anthropics
围绕 .xlsx、.xlsm、.csv、.tsv 做读写、修复、清洗、格式整理、公式计算与格式转换,适合修改现有表格、生成新报表或把杂乱数据整理成交付级电子表格。
✎ 做 Excel/CSV 相关任务很省心,能直接读写、修复、清洗和格式转换,尤其擅长把乱七八糟的表格整理成交付级文件。
PDF处理
by anthropics
遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。
✎ PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。
Word文档
by anthropics
覆盖Word/.docx文档的创建、读取、编辑与重排,适合生成报告、备忘录、信函和模板,也能处理目录、页眉页脚、页码、图片替换、查找替换、修订批注及内容提取整理。
✎ 搞定 .docx 的创建、改写与精排版,目录、批量替换、批注修订和图片更新都能自动化,做正式文档尤其省心。
相关 MCP Server
文件系统
编辑精选by Anthropic
Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。
✎ 这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。
by wonderwhy-er
Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。
✎ 这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。
EdgarTools
编辑精选by dgunning
EdgarTools 是无需 API 密钥即可解析 SEC EDGAR 财报的开源 Python 库。
✎ 这个工具解决了金融数据获取的痛点——直接让 AI 读取结构化财报,比如让 Claude 分析苹果的 10-K 文件。适合量化分析师或金融开发者快速构建数据管道。但注意,它依赖 SEC 网站稳定性,高峰期可能延迟。