io.github.ggozad/haiku-rag
AI 与智能体by ggozad
基于 LanceDB 的 Agentic Retrieval Augmented Generation(RAG)方案,支持更主动的检索与生成。
想把传统 RAG 做得更聪明,haiku-rag 借助 LanceDB 实现更主动的检索与生成,让知识问答更准也更灵活。
什么是 io.github.ggozad/haiku-rag?
基于 LanceDB 的 Agentic Retrieval Augmented Generation(RAG)方案,支持更主动的检索与生成。
README
Haiku RAG
Agentic RAG built on LanceDB, Pydantic AI, and Docling.
Features
- Hybrid search — Vector + full-text with Reciprocal Rank Fusion
- Question answering — QA agents with citations (page numbers, section headings)
- Reranking — MxBAI, Cohere, Zero Entropy, or vLLM
- Research agents — Multi-agent workflows via pydantic-graph: plan, search, evaluate, synthesize
- RLM agent — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
- Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
- Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
- Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, LM Studio, vLLM. QA/Research: any model supported by Pydantic AI
- Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
- CLI & Python API — Full functionality from command line or code
- MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
- Visual grounding — View chunks highlighted on original page images
- File monitoring — Watch directories and auto-index on changes
- Time travel — Query the database at any historical point with
--before - Inspector — TUI for browsing documents, chunks, and search results
Installation
Python 3.12 or newer required
Full Package (Recommended)
pip install haiku.rag
Includes all features: document processing, all embedding providers, and rerankers.
Using uv? uv pip install haiku.rag
Slim Package (Minimal Dependencies)
pip install haiku.rag-slim
Install only the extras you need. See the Installation documentation for available options.
Quick Start
Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.
# Index a PDF
haiku-rag add-src paper.pdf
# Search
haiku-rag search "attention mechanism"
# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?" --cite
# Research mode — iterative planning and search
haiku-rag research "What are the limitations of the approach?"
# RLM mode — complex analytical tasks via code execution
haiku-rag rlm "How many documents mention transformers?"
# Interactive chat — multi-turn conversations with memory
haiku-rag chat
# Watch a directory for changes
haiku-rag serve --monitor
See Configuration for customization options.
Python API
from haiku.rag.client import HaikuRAG
async with HaikuRAG("research.lancedb", create=True) as rag:
# Index documents
await rag.create_document_from_source("paper.pdf")
await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")
# Search — returns chunks with provenance
results = await rag.search("self-attention")
for result in results:
print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")
# QA with citations
answer, citations = await rag.ask("What is the complexity of self-attention?")
print(answer)
for cite in citations:
print(f" [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")
For research agents and chat, see the Agents docs.
MCP Server
Use with AI assistants like Claude Desktop:
haiku-rag serve --mcp --stdio
Add to your Claude Desktop configuration:
{
"mcpServers": {
"haiku-rag": {
"command": "haiku-rag",
"args": ["serve", "--mcp", "--stdio"]
}
}
}
Provides tools for document management, search, QA, and research directly in your AI assistant.
Examples
See the examples directory for working examples:
- Docker Setup - Complete Docker deployment with file monitoring and MCP server
- Web Application - Full-stack conversational RAG with CopilotKit frontend
Documentation
Full documentation at: https://ggozad.github.io/haiku.rag/
- Installation - Provider setup
- Architecture - System overview
- Configuration - YAML configuration
- CLI - Command reference
- Python API - Complete API docs
- Agents - QA and research agents
- RLM Agent - Complex analytical tasks via code execution
- Applications - Chat TUI, web app, and inspector
- Server - File monitoring and MCP
- MCP - Model Context Protocol integration
- Benchmarks - Performance benchmarks
- Changelog - Version history
License
This project is licensed under the MIT License.
<!-- mcp-name is used by the MCP registry to identify this server -->mcp-name: io.github.ggozad/haiku-rag
常见问题
io.github.ggozad/haiku-rag 是什么?
基于 LanceDB 的 Agentic Retrieval Augmented Generation(RAG)方案,支持更主动的检索与生成。
相关 Skills
Claude接口
by anthropics
面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。
✎ 想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心
提示工程专家
by alirezarezvani
覆盖Prompt优化、Few-shot设计、结构化输出、RAG评测与Agent工作流编排,适合分析token成本、评估LLM输出质量,并搭建可落地的AI智能体系统。
✎ 把提示优化、LLM评测到RAG与智能体设计串成一套方法,适合想系统提升AI开发效率的人。
智能体流程设计
by alirezarezvani
面向生产级多 Agent 编排,梳理顺序、并行、分层、事件驱动、共识五种工作流设计,覆盖 handoff、状态管理、容错重试、上下文预算与成本优化,适合搭建复杂 AI 协作系统。
✎ 帮你把多智能体流程设计、编排和自动化统一起来,复杂工作流也能更稳地落地,适合追求强控制力的团队。
相关 MCP Server
顺序思维
编辑精选by Anthropic
Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。
✎ 这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。
知识图谱记忆
编辑精选by Anthropic
Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。
✎ 帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。
PraisonAI
编辑精选by mervinpraison
PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。
✎ 如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。