分块策略

Universal

chunking-strategy

by giuseppe-trisciuoglio

为 RAG、向量数据库和大文档处理挑选合适的 chunking 策略,按结构或语义切分内容,兼顾上下文保留、嵌入效果与检索质量。

212AI 与智能体未扫描2026年3月5日

安装

claude skill add --url github.com/giuseppe-trisciuoglio/developer-kit/tree/main/plugins/developer-kit-ai/skills/chunking-strategy

文档

Chunking Strategy for RAG Systems

Overview

Implement optimal chunking strategies for Retrieval-Augmented Generation (RAG) systems and document processing pipelines. This skill provides a comprehensive framework for breaking large documents into smaller, semantically meaningful segments that preserve context while enabling efficient retrieval and search.

When to Use

Use this skill when building RAG systems, optimizing vector search performance, implementing document processing pipelines, handling multi-modal content, or performance-tuning existing RAG systems with poor retrieval quality.

Instructions

Choose Chunking Strategy

Select appropriate chunking strategy based on document type and use case:

  1. Fixed-Size Chunking (Level 1)

    • Use for simple documents without clear structure
    • Start with 512 tokens and 10-20% overlap
    • Adjust size based on query type: 256 for factoid, 1024 for analytical
  2. Recursive Character Chunking (Level 2)

    • Use for documents with clear structural boundaries
    • Implement hierarchical separators: paragraphs → sentences → words
    • Customize separators for document types (HTML, Markdown)
  3. Structure-Aware Chunking (Level 3)

    • Use for structured documents (Markdown, code, tables, PDFs)
    • Preserve semantic units: functions, sections, table blocks
    • Validate structure preservation post-splitting
  4. Semantic Chunking (Level 4)

    • Use for complex documents with thematic shifts
    • Implement embedding-based boundary detection
    • Configure similarity threshold (0.8) and buffer size (3-5 sentences)
  5. Advanced Methods (Level 5)

    • Use Late Chunking for long-context embedding models
    • Apply Contextual Retrieval for high-precision requirements
    • Monitor computational costs vs. retrieval improvements

Reference detailed strategy implementations in references/strategies.md.

Implement Chunking Pipeline

Follow these steps to implement effective chunking:

  1. Pre-process documents

    • Analyze document structure and content types
    • Identify multi-modal content (tables, images, code)
    • Assess information density and complexity
  2. Select strategy parameters

    • Choose chunk size based on embedding model context window
    • Set overlap percentage (10-20% for most cases)
    • Configure strategy-specific parameters
  3. Process and validate

    • Apply chosen chunking strategy
    • Validate semantic coherence of chunks
    • Test with representative documents
  4. Evaluate and iterate

    • Measure retrieval precision and recall
    • Monitor processing latency and resource usage
    • Optimize based on specific use case requirements

Reference detailed implementation guidelines in references/implementation.md.

Evaluate Performance

Use these metrics to evaluate chunking effectiveness:

  • Retrieval Precision: Fraction of retrieved chunks that are relevant
  • Retrieval Recall: Fraction of relevant chunks that are retrieved
  • End-to-End Accuracy: Quality of final RAG responses
  • Processing Time: Latency impact on overall system
  • Resource Usage: Memory and computational costs

Reference detailed evaluation framework in references/evaluation.md.

Examples

Basic Fixed-Size Chunking

python
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Configure for factoid queries
splitter = RecursiveCharacterTextSplitter(
    chunk_size=256,
    chunk_overlap=25,
    length_function=len
)

chunks = splitter.split_documents(documents)

Structure-Aware Code Chunking

python
def chunk_python_code(code):
    """Split Python code into semantic chunks"""
    import ast

    tree = ast.parse(code)
    chunks = []

    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
            chunks.append(ast.get_source_segment(code, node))

    return chunks

Semantic Chunking with Embeddings

python
def semantic_chunk(text, similarity_threshold=0.8):
    """Chunk text based on semantic boundaries"""
    sentences = split_into_sentences(text)
    embeddings = generate_embeddings(sentences)

    chunks = []
    current_chunk = [sentences[0]]

    for i in range(1, len(sentences)):
        similarity = cosine_similarity(embeddings[i-1], embeddings[i])

        if similarity < similarity_threshold:
            chunks.append(" ".join(current_chunk))
            current_chunk = [sentences[i]]
        else:
            current_chunk.append(sentences[i])

    chunks.append(" ".join(current_chunk))
    return chunks

Best Practices

Core Principles

  • Balance context preservation with retrieval precision
  • Maintain semantic coherence within chunks
  • Optimize for embedding model constraints
  • Preserve document structure when beneficial

Implementation Guidelines

  • Start simple with fixed-size chunking (512 tokens, 10-20% overlap)
  • Test thoroughly with representative documents
  • Monitor both accuracy metrics and computational costs
  • Iterate based on specific document characteristics

Common Pitfalls to Avoid

  • Over-chunking: Creating too many small, context-poor chunks
  • Under-chunking: Missing relevant information due to oversized chunks
  • Ignoring document structure and semantic boundaries
  • Using one-size-fits-all approach for diverse content types
  • Neglecting overlap for boundary-crossing information

Constraints and Warnings

Resource Considerations

  • Semantic and contextual methods require significant computational resources
  • Late chunking needs long-context embedding models
  • Complex strategies increase processing latency
  • Monitor memory usage for large document processing

Quality Requirements

  • Validate chunk semantic coherence post-processing
  • Test with domain-specific documents before deployment
  • Ensure chunks maintain standalone meaning where possible
  • Implement proper error handling for edge cases

References

Reference detailed documentation in the references/ folder:

相关 Skills

Claude接口

by anthropics

Universal
热门

面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。

想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心

AI 与智能体
未扫描119.1k

RAG架构师

by alirezarezvani

Universal
热门

聚焦生产级RAG系统设计与优化,覆盖文档切块、检索链路、索引构建、召回评估等关键环节,适合搭建可扩展、高准确率的知识库问答与检索增强应用。

面向RAG落地,把知识库、向量检索和生成链路系统串联起来,做架构设计时更清晰,也更少踩坑。

AI 与智能体
未扫描11.5k

智能体流程设计

by alirezarezvani

Universal
热门

面向生产级多 Agent 编排,梳理顺序、并行、分层、事件驱动、共识五种工作流设计,覆盖 handoff、状态管理、容错重试、上下文预算与成本优化,适合搭建复杂 AI 协作系统。

帮你把多智能体流程设计、编排和自动化统一起来,复杂工作流也能更稳地落地,适合追求强控制力的团队。

AI 与智能体
未扫描11.5k

相关 MCP 服务

知识图谱记忆

编辑精选

by Anthropic

热门

Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。

帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。

AI 与智能体
83.9k

顺序思维

编辑精选

by Anthropic

热门

Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。

这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。

AI 与智能体
83.9k

PraisonAI

编辑精选

by mervinpraison

热门

PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。

如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。

AI 与智能体
6.9k

评论