pydantic-ai-testing

by anderskev

Test PydanticAI agents using TestModel, FunctionModel, VCR cassettes, and inline snapshots. Use when writing unit tests, mocking LLM responses, or recording API interactions.

3.7k编码与调试未扫描2026年3月30日

安装

claude skill add --url https://github.com/openclaw/skills

文档

Testing PydanticAI Agents

TestModel (Deterministic Testing)

Use TestModel for tests without API calls:

python
import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

def test_agent_basic():
    agent = Agent('openai:gpt-4o')

    # Override with TestModel for testing
    result = agent.run_sync('Hello', model=TestModel())

    # TestModel generates deterministic output based on output_type
    assert isinstance(result.output, str)

TestModel Configuration

python
from pydantic_ai.models.test import TestModel

# Custom text output
model = TestModel(custom_output_text='Custom response')
result = agent.run_sync('Hello', model=model)
assert result.output == 'Custom response'

# Custom structured output (for output_type agents)
from pydantic import BaseModel

class Response(BaseModel):
    message: str
    score: int

agent = Agent('openai:gpt-4o', output_type=Response)
model = TestModel(custom_output_args={'message': 'Test', 'score': 42})
result = agent.run_sync('Hello', model=model)
assert result.output.message == 'Test'

# Seed for reproducible random output
model = TestModel(seed=42)

# Force tool calls
model = TestModel(call_tools=['my_tool', 'another_tool'])

Override Context Manager

python
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

agent = Agent('openai:gpt-4o', deps_type=MyDeps)

def test_with_override():
    mock_deps = MyDeps(db=MockDB())

    with agent.override(model=TestModel(), deps=mock_deps):
        # All runs use TestModel and mock_deps
        result = agent.run_sync('Hello')
        assert result.output

FunctionModel (Custom Logic)

For complete control over model responses:

python
from pydantic_ai import Agent, ModelMessage, ModelResponse, TextPart
from pydantic_ai.models.function import AgentInfo, FunctionModel

def custom_model(
    messages: list[ModelMessage],
    info: AgentInfo
) -> ModelResponse:
    """Custom model that inspects messages and returns response."""
    # Access the last user message
    last_msg = messages[-1]

    # Return custom response
    return ModelResponse(parts=[TextPart('Custom response')])

agent = Agent(FunctionModel(custom_model))
result = agent.run_sync('Hello')

FunctionModel with Tool Calls

python
from pydantic_ai import ToolCallPart, ModelResponse
from pydantic_ai.models.function import AgentInfo, FunctionModel

def model_with_tools(
    messages: list[ModelMessage],
    info: AgentInfo
) -> ModelResponse:
    # First request: call a tool
    if len(messages) == 1:
        return ModelResponse(parts=[
            ToolCallPart(
                tool_name='get_data',
                args='{"id": 123}'
            )
        ])

    # After tool response: return final result
    return ModelResponse(parts=[TextPart('Done with tool result')])

agent = Agent(FunctionModel(model_with_tools))

@agent.tool_plain
def get_data(id: int) -> str:
    return f"Data for {id}"

result = agent.run_sync('Get data')

VCR Cassettes (Recorded API Calls)

Record and replay real LLM API interactions:

python
import pytest

@pytest.mark.vcr
def test_with_recorded_response():
    """Uses recorded cassette from tests/cassettes/"""
    agent = Agent('openai:gpt-4o')
    result = agent.run_sync('Hello')
    assert 'hello' in result.output.lower()

# To record/update cassettes:
# uv run pytest --record-mode=rewrite tests/test_file.py

Cassette files are stored in tests/cassettes/ as YAML.

Inline Snapshots

Assert expected outputs with auto-updating snapshots:

python
from inline_snapshot import snapshot

def test_agent_output():
    result = agent.run_sync('Hello', model=TestModel())

    # First run: creates snapshot
    # Subsequent runs: asserts against it
    assert result.output == snapshot('expected output here')

# Update snapshots:
# uv run pytest --inline-snapshot=fix

Testing Tools

python
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.test import TestModel

def test_tool_is_called():
    agent = Agent('openai:gpt-4o')
    tool_called = False

    @agent.tool_plain
    def my_tool(x: int) -> str:
        nonlocal tool_called
        tool_called = True
        return f"Result: {x}"

    # Force TestModel to call the tool
    result = agent.run_sync(
        'Use my_tool',
        model=TestModel(call_tools=['my_tool'])
    )

    assert tool_called

Testing with Dependencies

python
from dataclasses import dataclass
from unittest.mock import AsyncMock

@dataclass
class Deps:
    api: ApiClient

def test_tool_with_deps():
    # Create mock dependency
    mock_api = AsyncMock()
    mock_api.fetch.return_value = {'data': 'test'}

    agent = Agent('openai:gpt-4o', deps_type=Deps)

    @agent.tool
    async def fetch_data(ctx: RunContext[Deps]) -> dict:
        return await ctx.deps.api.fetch()

    with agent.override(
        model=TestModel(call_tools=['fetch_data']),
        deps=Deps(api=mock_api)
    ):
        result = agent.run_sync('Fetch data')

    mock_api.fetch.assert_called_once()

Capture Messages

Inspect all messages in a run:

python
from pydantic_ai import Agent, capture_run_messages

agent = Agent('openai:gpt-4o')

with capture_run_messages() as messages:
    result = agent.run_sync('Hello', model=TestModel())

# Inspect captured messages
for msg in messages:
    print(msg)

Testing Patterns Summary

ScenarioApproach
Unit tests without APITestModel()
Custom model logicFunctionModel(func)
Recorded real responses@pytest.mark.vcr
Assert output structureinline_snapshot
Test tools are calledTestModel(call_tools=[...])
Mock dependenciesagent.override(deps=...)

pytest Configuration

Typical pyproject.toml:

toml
[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"  # For async tests

Run tests:

bash
uv run pytest tests/test_agent.py -v
uv run pytest --inline-snapshot=fix  # Update snapshots

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描109.6k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描109.6k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描109.6k

相关 MCP 服务

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
82.9k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
51.5k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.2k

评论