文档处理
office-docs
by baiyunrei2025
Comprehensive document processing for Microsoft Word (.docx) and WPS Office files. Use when Codex needs to work with professional documents for: (1) Creating new documents, (2) Modifying or editing content, (3) Converting between formats, (4) Extracting text and metadata, (5) Troubleshooting document issues, (6) Batch processing documents, or any other Office document tasks.
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/baiyunrei2025/office-docs文档
Office Documents Skill
This skill provides comprehensive tools and workflows for working with Microsoft Word (.docx) and WPS Office documents. It covers creation, editing, conversion, analysis, and troubleshooting of professional documents.
Quick Start
Basic Operations
Read document content:
# Use python-docx for .docx files
from docx import Document
doc = Document('document.docx')
text = '\n'.join([paragraph.text for paragraph in doc.paragraphs])
Create new document:
from docx import Document
from docx.shared import Inches
doc = Document()
doc.add_heading('Document Title', 0)
doc.add_paragraph('This is a new paragraph.')
doc.save('new_document.docx')
Common Tasks
- Text extraction - See TEXT_EXTRACTION.md
- Format conversion - See CONVERSION.md
- Document analysis - See ANALYSIS.md
- Troubleshooting - See TROUBLESHOOTING.md
Core Tools and Libraries
Python Libraries
For .docx files:
python-docx- Primary library for reading/writing .docxdocx2txt- Simple text extractiondocxcompose- Advanced document compositiondocx-mailmerge- Mail merge functionality
For WPS files:
pywps- WPS file manipulation (when available)- Conversion to .docx first recommended
For format conversion:
pandoc- Universal document converterlibreoffice- Office suite for conversionunoconv- Universal office converter
Command Line Tools
Document conversion:
# Convert .docx to PDF
libreoffice --headless --convert-to pdf document.docx
# Convert .docx to text
pandoc document.docx -o document.txt
# Batch convert WPS to .docx
for file in *.wps; do libreoffice --headless --convert-to docx "$file"; done
Document analysis:
# Extract metadata
exiftool document.docx
# Check file integrity
file document.docx
Workflows
1. Document Creation Workflow
When creating new documents:
- Choose template - Start from template or create from scratch
- Add structure - Headings, paragraphs, lists
- Apply formatting - Styles, fonts, spacing
- Add elements - Tables, images, hyperlinks
- Finalize - Page setup, headers/footers, save
See CREATION.md for detailed patterns.
2. Document Editing Workflow
When modifying existing documents:
- Backup original - Always create backup first
- Analyze structure - Understand document layout
- Make changes - Edit content, update formatting
- Preserve formatting - Maintain original styles
- Validate - Check for corruption, save new version
See EDITING.md for detailed patterns.
3. Conversion Workflow
When converting between formats:
- Identify source format - .docx, .wps, .doc, .rtf, etc.
- Choose conversion tool - Based on format and requirements
- Convert - With appropriate options
- Verify - Check content preservation
- Clean up - Remove temporary files
See CONVERSION.md for detailed patterns.
Common Issues and Solutions
1. Corrupted Documents
Symptoms: Won't open, error messages, missing content
Solutions:
- Try opening in different application
- Use recovery mode in Word/WPS
- Extract content with
python-docxignoring errors - Convert to different format and back
See TROUBLESHOOTING.md for detailed recovery procedures.
2. Formatting Issues
Symptoms: Wrong fonts, broken layout, missing styles
Solutions:
- Check style definitions
- Verify font availability
- Use template-based approach
- Simplify complex formatting
3. Compatibility Problems
Symptoms: Different appearance in Word vs WPS, missing features
Solutions:
- Stick to common features
- Test in both applications
- Use standard formats
- Provide alternative versions
Advanced Features
Document Automation
Batch processing:
import os
from docx import Document
def process_documents(folder_path):
for filename in os.listdir(folder_path):
if filename.endswith('.docx'):
doc_path = os.path.join(folder_path, filename)
process_single_document(doc_path)
Template-based generation:
from docx import Document
def generate_from_template(template_path, data):
doc = Document(template_path)
# Replace placeholders with data
for paragraph in doc.paragraphs:
for key, value in data.items():
if f'{{{{ {key} }}}}' in paragraph.text:
paragraph.text = paragraph.text.replace(f'{{{{ {key} }}}}', value)
return doc
Document Analysis
Extract statistics:
def analyze_document(doc_path):
doc = Document(doc_path)
stats = {
'paragraphs': len(doc.paragraphs),
'tables': len(doc.tables),
'images': len(doc.inline_shapes),
'sections': len(doc.sections),
'styles': len(doc.styles)
}
return stats
Check formatting consistency:
def check_formatting(doc):
issues = []
for i, para in enumerate(doc.paragraphs):
if para.style.name == 'Normal' and para.text.strip():
# Check for inconsistent formatting
if len(para.runs) > 1:
issues.append(f"Paragraph {i}: Multiple runs in Normal style")
return issues
Best Practices
1. Always Backup
import shutil
import os
def backup_document(filepath):
backup_path = filepath + '.backup'
shutil.copy2(filepath, backup_path)
return backup_path
2. Use Version Control
- Save incremental versions
- Use descriptive filenames
- Document changes made
3. Test Thoroughly
- Test in target application
- Verify all content preserved
- Check formatting integrity
4. Handle Errors Gracefully
try:
doc = Document(filepath)
except Exception as e:
print(f"Error opening {filepath}: {e}")
# Try alternative methods
return extract_text_fallback(filepath)
Reference Files
For detailed information on specific topics, consult these reference files:
- TEXT_EXTRACTION.md - Text extraction methods and patterns
- CONVERSION.md - Format conversion guides
- ANALYSIS.md - Document analysis techniques
- TROUBLESHOOTING.md - Common issues and solutions
- CREATION.md - Document creation patterns
- EDITING.md - Document editing workflows
- AUTOMATION.md - Automation scripts and templates
Scripts
Available scripts in the scripts/ directory:
extract_text.py- Extract text from .docx filesconvert_format.py- Convert between document formatsbatch_process.py- Process multiple documentsdocument_stats.py- Generate document statisticsrepair_document.py- Attempt to repair corrupted documents
Run scripts with appropriate parameters:
python scripts/extract_text.py input.docx output.txt
Getting Help
If you encounter issues not covered in this skill:
- Check the relevant reference file
- Search for specific error messages
- Try alternative approaches
- Consider converting to simpler format
Remember: When in doubt, create a backup and work on a copy.
相关 Skills
PPT处理
by anthropics
处理 .pptx 全流程:创建演示文稿、提取和解析幻灯片内容、批量修改现有文件,支持模板套用、合并拆分、备注评论与版式调整。
✎ 涉及PPTX的创建、解析、修改到合并拆分都能一站搞定,连备注、模板和评论也能处理,做演示文稿特别省心。
技能工坊
by anthropics
覆盖 Skill 从创建到迭代优化全流程:起草能力、补测试提示、跑评测与基准方差分析,并持续改写内容和描述,提升效果与触发准确率。
✎ 技能工坊把技能从创建、迭代到评测串成闭环,方差分析加描述优化,特别适合把触发准确率打磨得更稳。
Word文档
by anthropics
覆盖Word/.docx文档的创建、读取、编辑与重排,适合生成报告、备忘录、信函和模板,也能处理目录、页眉页脚、页码、图片替换、查找替换、修订批注及内容提取整理。
✎ 搞定 .docx 的创建、改写与精排版,目录、批量替换、批注修订和图片更新都能自动化,做正式文档尤其省心。
相关 MCP 服务
文件系统
编辑精选by Anthropic
Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。
✎ 这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。
by wonderwhy-er
Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。
✎ 这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。
EdgarTools
编辑精选by dgunning
EdgarTools 是无需 API 密钥即可解析 SEC EDGAR 财报的开源 Python 库。
✎ 这个工具解决了金融数据获取的痛点——直接让 AI 读取结构化财报,比如让 Claude 分析苹果的 10-K 文件。适合量化分析师或金融开发者快速构建数据管道。但注意,它依赖 SEC 网站稳定性,高峰期可能延迟。