文档处理

Name: 文档处理
Rating: 5 (4155 reviews)
Author: baiyunrei2025

office-docs

by baiyunrei2025

Comprehensive document processing for Microsoft Word (.docx) and WPS Office files. Use when Codex needs to work with professional documents for: (1) Creating new documents, (2) Modifying or editing content, (3) Converting between formats, (4) Extracting text and metadata, (5) Troubleshooting document issues, (6) Batch processing documents, or any other Office document tasks.

4.2k效率与工作流未扫描2026年3月23日

安装

claude skill add --url github.com/openclaw/skills/tree/main/skills/baiyunrei2025/office-docs

文档

Office Documents Skill

This skill provides comprehensive tools and workflows for working with Microsoft Word (.docx) and WPS Office documents. It covers creation, editing, conversion, analysis, and troubleshooting of professional documents.

Quick Start

Basic Operations

Read document content:

python

# Use python-docx for .docx files
from docx import Document
doc = Document('document.docx')
text = '\n'.join([paragraph.text for paragraph in doc.paragraphs])

Create new document:

python

from docx import Document
from docx.shared import Inches

doc = Document()
doc.add_heading('Document Title', 0)
doc.add_paragraph('This is a new paragraph.')
doc.save('new_document.docx')

Common Tasks

Text extraction - See TEXT_EXTRACTION.md
Format conversion - See CONVERSION.md
Document analysis - See ANALYSIS.md
Troubleshooting - See TROUBLESHOOTING.md

Core Tools and Libraries

Python Libraries

For .docx files:

python-docx - Primary library for reading/writing .docx
docx2txt - Simple text extraction
docxcompose - Advanced document composition
docx-mailmerge - Mail merge functionality

For WPS files:

pywps - WPS file manipulation (when available)
Conversion to .docx first recommended

For format conversion:

pandoc - Universal document converter
libreoffice - Office suite for conversion
unoconv - Universal office converter

Command Line Tools

Document conversion:

bash

# Convert .docx to PDF
libreoffice --headless --convert-to pdf document.docx

# Convert .docx to text
pandoc document.docx -o document.txt

# Batch convert WPS to .docx
for file in *.wps; do libreoffice --headless --convert-to docx "$file"; done

Document analysis:

bash

# Extract metadata
exiftool document.docx

# Check file integrity
file document.docx

Workflows

1. Document Creation Workflow

When creating new documents:

Choose template - Start from template or create from scratch
Add structure - Headings, paragraphs, lists
Apply formatting - Styles, fonts, spacing
Add elements - Tables, images, hyperlinks
Finalize - Page setup, headers/footers, save

See CREATION.md for detailed patterns.

2. Document Editing Workflow

When modifying existing documents:

Backup original - Always create backup first
Analyze structure - Understand document layout
Make changes - Edit content, update formatting
Preserve formatting - Maintain original styles
Validate - Check for corruption, save new version

See EDITING.md for detailed patterns.

3. Conversion Workflow

When converting between formats:

Identify source format - .docx, .wps, .doc, .rtf, etc.
Choose conversion tool - Based on format and requirements
Convert - With appropriate options
Verify - Check content preservation
Clean up - Remove temporary files

See CONVERSION.md for detailed patterns.

Common Issues and Solutions

1. Corrupted Documents

Symptoms: Won't open, error messages, missing content

Solutions:

Try opening in different application
Use recovery mode in Word/WPS
Extract content with python-docx ignoring errors
Convert to different format and back

See TROUBLESHOOTING.md for detailed recovery procedures.

2. Formatting Issues

Symptoms: Wrong fonts, broken layout, missing styles

Solutions:

Check style definitions
Verify font availability
Use template-based approach
Simplify complex formatting

3. Compatibility Problems

Symptoms: Different appearance in Word vs WPS, missing features

Solutions:

Stick to common features
Test in both applications
Use standard formats
Provide alternative versions

Advanced Features

Document Automation

Batch processing:

python

import os
from docx import Document

def process_documents(folder_path):
    for filename in os.listdir(folder_path):
        if filename.endswith('.docx'):
            doc_path = os.path.join(folder_path, filename)
            process_single_document(doc_path)

Template-based generation:

python

from docx import Document

def generate_from_template(template_path, data):
    doc = Document(template_path)
    # Replace placeholders with data
    for paragraph in doc.paragraphs:
        for key, value in data.items():
            if f'{{{{ {key} }}}}' in paragraph.text:
                paragraph.text = paragraph.text.replace(f'{{{{ {key} }}}}', value)
    return doc

Document Analysis

Extract statistics:

python

def analyze_document(doc_path):
    doc = Document(doc_path)
    stats = {
        'paragraphs': len(doc.paragraphs),
        'tables': len(doc.tables),
        'images': len(doc.inline_shapes),
        'sections': len(doc.sections),
        'styles': len(doc.styles)
    }
    return stats

Check formatting consistency:

python

def check_formatting(doc):
    issues = []
    for i, para in enumerate(doc.paragraphs):
        if para.style.name == 'Normal' and para.text.strip():
            # Check for inconsistent formatting
            if len(para.runs) > 1:
                issues.append(f"Paragraph {i}: Multiple runs in Normal style")
    return issues

Best Practices

1. Always Backup

python

import shutil
import os

def backup_document(filepath):
    backup_path = filepath + '.backup'
    shutil.copy2(filepath, backup_path)
    return backup_path

2. Use Version Control

Save incremental versions
Use descriptive filenames
Document changes made

3. Test Thoroughly

Test in target application
Verify all content preserved
Check formatting integrity

4. Handle Errors Gracefully

python

try:
    doc = Document(filepath)
except Exception as e:
    print(f"Error opening {filepath}: {e}")
    # Try alternative methods
    return extract_text_fallback(filepath)

Reference Files

For detailed information on specific topics, consult these reference files:

TEXT_EXTRACTION.md - Text extraction methods and patterns
CONVERSION.md - Format conversion guides
ANALYSIS.md - Document analysis techniques
TROUBLESHOOTING.md - Common issues and solutions
CREATION.md - Document creation patterns
EDITING.md - Document editing workflows
AUTOMATION.md - Automation scripts and templates

Scripts

Available scripts in the scripts/ directory:

extract_text.py - Extract text from .docx files
convert_format.py - Convert between document formats
batch_process.py - Process multiple documents
document_stats.py - Generate document statistics
repair_document.py - Attempt to repair corrupted documents

Run scripts with appropriate parameters:

bash

python scripts/extract_text.py input.docx output.txt

Getting Help

If you encounter issues not covered in this skill:

Check the relevant reference file
Search for specific error messages
Try alternative approaches
Consider converting to simpler format

Remember: When in doubt, create a backup and work on a copy.

文档处理

安装

文档

Office Documents Skill

Quick Start

Basic Operations

Common Tasks

Core Tools and Libraries

Python Libraries

Command Line Tools

Workflows

1. Document Creation Workflow

2. Document Editing Workflow

3. Conversion Workflow

Common Issues and Solutions

1. Corrupted Documents

2. Formatting Issues

3. Compatibility Problems

Advanced Features

Document Automation

Document Analysis

Best Practices

1. Always Backup

2. Use Version Control

3. Test Thoroughly

4. Handle Errors Gracefully

Reference Files

Scripts

Getting Help

相关 Skills

PPT处理

技能工坊

Word文档

相关 MCP 服务

文件系统

io.github.wonderwhy-er/desktop-commander

EdgarTools

评论