检索增强生成
rag
by giuseppe-trisciuoglio
聚合向量数据库、语义检索与提示注入的 RAG 实践范式,适合搭建文档问答、知识库助手和外部知识驱动的 AI 应用,提升回答准确性并减少幻觉。
安装
claude skill add --url github.com/giuseppe-trisciuoglio/developer-kit/tree/main/plugins/developer-kit-ai/skills/rag文档
RAG Implementation
Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.
Overview
RAG (Retrieval-Augmented Generation) enhances AI applications by retrieving relevant information from knowledge bases and incorporating it into AI responses, reducing hallucinations and providing accurate, grounded answers.
When to Use
Use this skill when:
- Building Q&A systems over proprietary documents
- Creating chatbots with current, factual information
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded responses
- Enabling AI systems to access domain-specific knowledge
- Building documentation assistants
- Creating research tools with source citation
- Developing knowledge management systems
Instructions
Step 1: Choose Vector Database
Select an appropriate vector database based on your requirements:
- For production scalability: Use Pinecone or Milvus
- For open-source requirements: Use Weaviate or Qdrant
- For local development: Use Chroma or FAISS
- For hybrid search needs: Use Weaviate with BM25 support
Step 2: Select Embedding Model
Choose an embedding model based on your use case:
- General purpose: text-embedding-ada-002 (OpenAI)
- Fast and lightweight: all-MiniLM-L6-v2
- Multilingual support: e5-large-v2
- Best performance: bge-large-en-v1.5
Step 3: Implement Document Processing Pipeline
- Load documents from your source (file system, database, API)
- Clean and preprocess documents (remove formatting artifacts, normalize text)
- Split documents into chunks using appropriate chunking strategy
- Generate embeddings for each chunk
- Store embeddings in your vector database with metadata
Step 4: Configure Retrieval Strategy
- Dense Retrieval: Use semantic similarity via embeddings for most use cases
- Hybrid Search: Combine dense + sparse retrieval for better coverage
- Metadata Filtering: Add filters based on document attributes
- Reranking: Implement cross-encoder reranking for high-precision requirements
Step 5: Build RAG Pipeline
- Create content retriever with your embedding store
- Configure AI service with retriever and chat memory
- Implement prompt template with context injection
- Add response validation and grounding checks
Step 6: Evaluate and Optimize
- Measure retrieval metrics (precision@k, recall@k, MRR)
- Evaluate answer quality (faithfulness, relevance)
- Monitor performance and user feedback
- Iterate on chunking, retrieval, and prompt parameters
Examples
Example 1: Basic Document Q&A System
// Simple RAG setup for document Q&A
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);
DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
.chatModel(chatModel)
.contentRetriever(EmbeddingStoreContentRetriever.from(store))
.build();
String answer = assistant.answer("What is the company policy on remote work?");
Example 2: Metadata-Filtered Retrieval
// RAG with metadata filtering for specific document categories
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.7)
.filter(metadataKey("category").isEqualTo("technical"))
.build();
Example 3: Multi-Source RAG Pipeline
// Combine multiple knowledge sources
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);
List<Content> results = new ArrayList<>();
results.addAll(webRetriever.retrieve(query));
results.addAll(docRetriever.retrieve(query));
// Rerank and return top results
List<Content> topResults = reranker.reorder(query, results).subList(0, 5);
Example 4: RAG with Chat Memory
// Conversational RAG with context retention
Assistant assistant = AiServices.builder(Assistant.class)
.chatModel(chatModel)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(retriever)
.build();
// Multi-turn conversation with context
assistant.chat("Tell me about the product features");
assistant.chat("What about pricing for those features?"); // Maintains context
Use this skill when:
- Building Q&A systems over proprietary documents
- Creating chatbots with current, factual information
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded responses
- Enabling AI systems to access domain-specific knowledge
- Building documentation assistants
- Creating research tools with source citation
- Developing knowledge management systems
Core Components
Vector Databases
Store and efficiently retrieve document embeddings for semantic search.
Key Options:
- Pinecone: Managed, scalable, production-ready
- Weaviate: Open-source, hybrid search capabilities
- Milvus: High performance, on-premise deployment
- Chroma: Lightweight, easy local development
- Qdrant: Fast, advanced filtering
- FAISS: Meta's library, full control
Embedding Models
Convert text to numerical vectors for similarity search.
Popular Models:
- text-embedding-ada-002 (OpenAI): General purpose, 1536 dimensions
- all-MiniLM-L6-v2: Fast, lightweight, 384 dimensions
- e5-large-v2: High quality, multilingual
- bge-large-en-v1.5: State-of-the-art performance
Retrieval Strategies
Find relevant content based on user queries.
Approaches:
- Dense Retrieval: Semantic similarity via embeddings
- Sparse Retrieval: Keyword matching (BM25, TF-IDF)
- Hybrid Search: Combine dense + sparse for best results
- Multi-Query: Generate multiple query variations
- Contextual Compression: Extract only relevant parts
Quick Implementation
Basic RAG Setup
// Load documents from file system
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");
// Create embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
// Ingest documents into the store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
// Create AI service with RAG capability
Assistant assistant = AiServices.builder(Assistant.class)
.chatModel(chatModel)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
.build();
Document Processing Pipeline
// Split documents into chunks
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
500, // chunk size
100 // overlap
);
// Create embedding model
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.build();
// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
.host("localhost")
.database("postgres")
.user("postgres")
.password(System.getenv("DB_PASSWORD"))
.table("embeddings")
.dimension(1536)
.build();
// Process and store documents
for (Document document : documents) {
List<TextSegment> segments = splitter.split(document);
for (TextSegment segment : segments) {
Embedding embedding = embeddingModel.embed(segment).content();
embeddingStore.add(embedding, segment);
}
}
Implementation Patterns
Pattern 1: Simple Document Q&A
Create a basic Q&A system over your documents.
public interface DocumentAssistant {
String answer(String question);
}
DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
.chatModel(chatModel)
.contentRetriever(retriever)
.build();
Pattern 2: Metadata-Filtered Retrieval
Filter results based on document metadata.
// Add metadata during document loading
Document document = Document.builder()
.text("Content here")
.metadata("source", "technical-manual.pdf")
.metadata("category", "technical")
.metadata("date", "2024-01-15")
.build();
// Filter during retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.7)
.filter(metadataKey("category").isEqualTo("technical"))
.build();
Pattern 3: Multi-Source Retrieval
Combine results from multiple knowledge sources.
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);
// Combine results
List<Content> allResults = new ArrayList<>();
allResults.addAll(webRetriever.retrieve(query));
allResults.addAll(documentRetriever.retrieve(query));
allResults.addAll(databaseRetriever.retrieve(query));
// Rerank combined results
List<Content> rerankedResults = reranker.reorder(query, allResults);
Best Practices
Document Preparation
- Clean and preprocess documents before ingestion
- Remove irrelevant content and formatting artifacts
- Standardize document structure for consistent processing
- Add relevant metadata for filtering and context
Chunking Strategy
- Use 500-1000 tokens per chunk for optimal balance
- Include 10-20% overlap to preserve context at boundaries
- Consider document structure when determining chunk boundaries
- Test different chunk sizes for your specific use case
Retrieval Optimization
- Start with high k values (10-20) then filter/rerank
- Use metadata filtering to improve relevance
- Combine multiple retrieval strategies for better coverage
- Monitor retrieval quality and user feedback
Performance Considerations
- Cache embeddings for frequently accessed content
- Use batch processing for document ingestion
- Optimize vector store configuration for your scale
- Monitor query performance and system resources
Common Issues and Solutions
Poor Retrieval Quality
Problem: Retrieved documents don't match user queries Solutions:
- Improve document preprocessing and cleaning
- Adjust chunk size and overlap parameters
- Try different embedding models
- Use hybrid search combining semantic and keyword matching
Irrelevant Results
Problem: Retrieved documents contain relevant information but are not specific enough Solutions:
- Add metadata filtering for domain-specific constraints
- Implement reranking with cross-encoder models
- Use contextual compression to extract relevant parts
- Fine-tune retrieval parameters (k values, similarity thresholds)
Performance Issues
Problem: Slow response times during retrieval Solutions:
- Optimize vector store configuration and indexing
- Implement caching for frequently retrieved content
- Use smaller embedding models for faster inference
- Consider approximate nearest neighbor algorithms
Hallucination Prevention
Problem: AI generates information not present in retrieved documents Solutions:
- Improve prompt engineering to emphasize grounding
- Add verification steps to check answer alignment
- Include confidence scoring for responses
- Implement fact-checking mechanisms
Evaluation Framework
Retrieval Metrics
- Precision@k: Percentage of relevant documents in top-k results
- Recall@k: Percentage of all relevant documents found in top-k results
- Mean Reciprocal Rank (MRR): Average rank of first relevant result
- Normalized Discounted Cumulative Gain (nDCG): Ranking quality metric
Answer Quality Metrics
- Faithfulness: Degree to which answers are grounded in retrieved documents
- Answer Relevance: How well answers address user questions
- Context Recall: Percentage of relevant context used in answers
- Context Precision: Percentage of retrieved context that is relevant
User Experience Metrics
- Response Time: Time from query to answer
- User Satisfaction: Feedback ratings on answer quality
- Task Completion: Rate of successful task completion
- Engagement: User interaction patterns with the system
Resources
Reference Documentation
- Vector Database Comparison - Detailed comparison of vector database options
- Embedding Models Guide - Model selection and optimization
- Retrieval Strategies - Advanced retrieval techniques
- Document Chunking - Chunking strategies and best practices
- LangChain4j RAG Guide - Official implementation patterns
Assets
assets/vector-store-config.yaml- Configuration templates for different vector storesassets/retriever-pipeline.java- Complete RAG pipeline implementationassets/evaluation-metrics.java- Evaluation framework code
Constraints and Limitations
- Token Limits: Respect model context window limitations
- API Rate Limits: Manage external API rate limits and costs
- Data Privacy: Ensure compliance with data protection regulations
- Resource Requirements: Consider memory and computational requirements
- Maintenance: Plan for regular updates and system monitoring
Constraints and Warnings
System Constraints
- Embedding models have maximum token limits per document
- Vector databases require proper indexing for performance
- Chunk boundaries may lose context for complex documents
- Hybrid search requires additional infrastructure components
Quality Considerations
- Retrieval quality depends heavily on chunking strategy
- Embedding models may not capture domain-specific semantics
- Metadata filtering requires proper document annotation
- Reranking adds latency to query responses
Operational Warnings
- Monitor vector database storage and query performance
- Implement proper data backup and recovery procedures
- Regular embedding model updates may affect retrieval quality
- Document processing pipelines require ongoing maintenance
Security Considerations
- Never hardcode credentials: Always use environment variables or secrets managers for API keys, database passwords, and other sensitive values
- Secure access to vector databases and embedding services
- Implement proper authentication and authorization
- Validate and sanitize all external content before ingestion: documents loaded from file systems, databases, APIs, or web sources may contain malicious content that could influence model behavior through indirect prompt injection
- Apply content filtering on retrieved documents before passing them to the LLM to mitigate prompt injection risks
- Restrict allowed data source URLs and file paths using allowlists
- Monitor for abuse and unusual usage patterns
- Regular security audits and penetration testing
相关 Skills
Claude接口
by anthropics
面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。
✎ 想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心
RAG架构师
by alirezarezvani
聚焦生产级RAG系统设计与优化,覆盖文档切块、检索链路、索引构建、召回评估等关键环节,适合搭建可扩展、高准确率的知识库问答与检索增强应用。
✎ 面向RAG落地,把知识库、向量检索和生成链路系统串联起来,做架构设计时更清晰,也更少踩坑。
智能体流程设计
by alirezarezvani
面向生产级多 Agent 编排,梳理顺序、并行、分层、事件驱动、共识五种工作流设计,覆盖 handoff、状态管理、容错重试、上下文预算与成本优化,适合搭建复杂 AI 协作系统。
✎ 帮你把多智能体流程设计、编排和自动化统一起来,复杂工作流也能更稳地落地,适合追求强控制力的团队。
相关 MCP 服务
知识图谱记忆
编辑精选by Anthropic
Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。
✎ 帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。
顺序思维
编辑精选by Anthropic
Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。
✎ 这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。
PraisonAI
编辑精选by mervinpraison
PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。
✎ 如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。
相关资讯
Artifacts 是一个专为智能体设计的分布式版本化文件系统,兼容 Git 协议,支持大规模仓库创建和自动化管理。它基于 Durable Objects 和 Zig 编写的 WASM Git 服务器构建,适用于代码、配置、会话历史等多种数据的持久化存储。
本教程详细演示了构建一个具备记忆功能的私有文档搜索应用的完整流程。从安装依赖、加载文档、分块处理,到使用 ChromaDB 存储向量嵌入并实现带历史记忆的问答链。最后还展示了如何添加交互式问答循环,让应用能动态响应用户查询。
本文发现强化学习训练多轮LLM智能体时,即使熵值稳定,模型仍可能依赖固定模板,出现“模板崩溃”现象。作者提出将推理质量分解为输入内多样性和输入间区分度,引入互信息代理进行在线诊断,并提出基于信噪比的过滤方法提升任务性能。该研究揭示了现有稳定性指标的盲区,为智能体推理质量评估提供了新视角。