agent-learner
by BytesAgain
Benchmark and compare agent prompts and evaluation results. Use when tuning strategies, evaluating outputs, or comparing configurations.
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/bytesagain/ba-agent-learner文档
Agent Learner
An AI toolkit for configuring, benchmarking, comparing, and optimizing agent prompts and evaluation results. Agent Learner provides persistent, file-based logging for each command category with timestamped entries, summary statistics, multi-format export, and full-text search across all records.
Commands
| Command | Description |
|---|---|
configure | Configure agent settings — log configuration entries or view recent ones |
benchmark | Benchmark agent performance — log benchmark results or view history |
compare | Compare agent outputs — log comparison data or view recent comparisons |
prompt | Prompt management — log prompt variations or view recent prompts |
evaluate | Evaluate agent outputs — log evaluation results or view history |
fine-tune | Fine-tune parameters — log fine-tuning sessions or view recent ones |
analyze | Analyze agent behavior — log analysis entries or view recent analyses |
cost | Cost tracking — log cost data or view recent cost entries |
usage | Usage monitoring — log usage metrics or view recent usage data |
optimize | Optimize configurations — log optimization runs or view history |
test | Test agent behavior — log test results or view recent tests |
report | Report generation — log report entries or view recent reports |
stats | Show summary statistics across all log categories (entry counts, data size, first entry date) |
export <fmt> | Export all data in json, csv, or txt format to the data directory |
search <term> | Full-text search across all log files (case-insensitive) |
recent | Show the 20 most recent entries from the activity history log |
status | Health check — show version, data directory, total entries, disk usage, and last activity |
help | Show the full help message with all available commands |
version | Print the current version string |
Each data command (configure, benchmark, compare, etc.) works in two modes:
- Without arguments: displays the 20 most recent entries from that category
- With arguments: saves the input as a new timestamped entry and reports the total count
Data Storage
All data is stored in plain text files under the data directory:
- Category logs:
$DATA_DIR/<command>.log— one file per command (e.g.,configure.log,benchmark.log,prompt.log), each entry istimestamp|value - History log:
$DATA_DIR/history.log— audit trail of every command executed with timestamps - Export files:
$DATA_DIR/export.<fmt>— generated by theexportcommand in json, csv, or txt format
Default data directory: ~/.local/share/agent-learner/
Requirements
- Bash (with
set -euo pipefailsupport) - Standard Unix utilities:
grep,cat,date,echo,wc,du,head,tail,basename - No external dependencies or API keys required
When to Use
- Benchmarking agent performance — When you need to track and compare benchmark results across different agent configurations, models, or prompt strategies
- Prompt engineering iteration — When you're testing multiple prompt variations and want to log each version with results for later comparison
- Cost and usage tracking — When you need to monitor API costs and usage metrics over time to optimize spending
- Fine-tuning experiments — When running fine-tuning sessions and you want to log parameters, results, and observations for reproducibility
- Cross-category analysis — When you need to search across all logged data (benchmarks, prompts, evaluations, costs) to find patterns or specific entries
Examples
# Initialize and check status
agent-learner status
# Log a benchmark result
agent-learner benchmark "GPT-4o on MMLU: 88.7% accuracy, 1.2s avg latency"
# Log a prompt variation
agent-learner prompt "System: You are a helpful coding assistant. Always explain your reasoning step by step."
# Compare two configurations
agent-learner compare "GPT-4o vs Claude-3.5: GPT-4o 12% faster, Claude 5% more accurate on code tasks"
# Track costs
agent-learner cost "March batch: 12,450 tokens input, 3,200 tokens output, $0.47 total"
# View all recent benchmarks
agent-learner benchmark
# Search across all logs for a specific term
agent-learner search "accuracy"
# Export all data as JSON
agent-learner export json
# View summary statistics
agent-learner stats
# Show recent activity
agent-learner recent
Output
All commands return output to stdout. Export files are written to the data directory:
agent-learner export json # → ~/.local/share/agent-learner/export.json
agent-learner export csv # → ~/.local/share/agent-learner/export.csv
agent-learner export txt # → ~/.local/share/agent-learner/export.txt
Every command execution is logged to $DATA_DIR/history.log for auditing purposes.
Powered by BytesAgain | bytesagain.com | hello@bytesagain.com
相关 Skills
Claude接口
by anthropics
面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。
✎ 想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心
提示工程专家
by alirezarezvani
覆盖Prompt优化、Few-shot设计、结构化输出、RAG评测与Agent工作流编排,适合分析token成本、评估LLM输出质量,并搭建可落地的AI智能体系统。
✎ 把提示优化、LLM评测到RAG与智能体设计串成一套方法,适合想系统提升AI开发效率的人。
智能体流程设计
by alirezarezvani
面向生产级多 Agent 编排,梳理顺序、并行、分层、事件驱动、共识五种工作流设计,覆盖 handoff、状态管理、容错重试、上下文预算与成本优化,适合搭建复杂 AI 协作系统。
✎ 帮你把多智能体流程设计、编排和自动化统一起来,复杂工作流也能更稳地落地,适合追求强控制力的团队。
相关 MCP 服务
顺序思维
编辑精选by Anthropic
Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。
✎ 这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。
知识图谱记忆
编辑精选by Anthropic
Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。
✎ 帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。
PraisonAI
编辑精选by mervinpraison
PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。
✎ 如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。