io.github.ggozad/haiku-rag

Name: io.github.ggozad/haiku-rag
Rating: 5 (527 reviews)
Author: ggozad

AI 与智能体

by ggozad

基于 LanceDB 的 Agentic Retrieval Augmented Generation（RAG）方案，支持更主动的检索与生成。

想把传统 RAG 做得更聪明，haiku-rag 借助 LanceDB 实现更主动的检索与生成，让知识问答更准也更灵活。

527GitHub

什么是 io.github.ggozad/haiku-rag？

基于 LanceDB 的 Agentic Retrieval Augmented Generation（RAG）方案，支持更主动的检索与生成。

README

Haiku RAG

Agentic RAG built on LanceDB, Pydantic AI, and Docling.

New: vision and multimodal search. Picture-aware ingestion captures embedded figure bytes; vision-capable QA models receive them alongside text. Multimodal embedders put picture vectors in the same space as text, enabling text-as-query → figure hits and image-as-query retrieval.

Features

Hybrid search — Vector + full-text with Reciprocal Rank Fusion
Multimodal & cross-modal search — Multimodal embedders (vLLM) put picture vectors in the same space as text; supports text-as-query → figure hits and image-as-query
Question answering — QA agents with citations (page numbers, section headings)
Vision QA — Vision-capable models receive figure bytes alongside chunk text
Reranking — MxBAI, Cohere, Zero Entropy, or vLLM
Research agents — Multi-agent workflows via pydantic-graph: plan, search, evaluate, synthesize
Analysis agent — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, LM Studio, vLLM (multimodal). QA/Research: any model supported by Pydantic AI
Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
CLI & Python API — Full functionality from command line or code
MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
Visual grounding — View chunks highlighted on original page images
File monitoring — Watch directories and auto-index on changes
Time travel — Query the database at any historical point with --before
Inspector — TUI for browsing documents, chunks, and search results

Installation

Python 3.12 or newer required

Full Package (Recommended)

bash

pip install haiku.rag

Includes all features: document processing, all embedding providers, and rerankers.

Using uv? uv pip install haiku.rag

Slim Package (Minimal Dependencies)

bash

pip install haiku.rag-slim

Install only the extras you need. See the Installation documentation for available options.

Quick Start

Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.

bash

# Index a PDF
haiku-rag add-src paper.pdf

# Search
haiku-rag search "attention mechanism"

# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?" --cite

# Research mode — iterative planning and search
haiku-rag research "What are the limitations of the approach?"

# Analyze — complex analytical tasks via code execution
haiku-rag analyze "How many documents mention transformers?"

# Interactive chat — multi-turn conversations with memory
haiku-rag chat

# Watch a directory for changes
haiku-rag serve --monitor

See Configuration for customization options.

Python API

python

from haiku.rag.client import HaikuRAG

async with HaikuRAG("research.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self-attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self-attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For research agents and chat, see the Agents docs.

MCP Server

Use with AI assistants like Claude Desktop:

bash

haiku-rag serve --mcp --stdio

Add to your Claude Desktop configuration:

json

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["serve", "--mcp", "--stdio"]
    }
  }
}

Provides tools for document management, search, QA, and research directly in your AI assistant.

Examples

See the examples directory for working examples:

Docker Setup - Complete Docker deployment with file monitoring and MCP server
Web Application - Full-stack conversational RAG with CopilotKit frontend

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

Installation - Provider setup
Architecture - System overview
Configuration - YAML configuration
CLI - Command reference
Python API - Complete API docs
Agents - QA and research agents
Analysis Agent - Complex analytical tasks via code execution
Applications - Chat TUI, web app, and inspector
Server - File monitoring and MCP
MCP - Model Context Protocol integration
Benchmarks - Performance benchmarks
Changelog - Version history

License

This project is licensed under the MIT License.

mcp-name: io.github.ggozad/haiku-rag

常见问题