性能基准测试

benchmark

by BytesAgain

Run performance benchmarks and stress tests using Python profiling tools. Use when you need to measure, compare, or analyze system and application performance.

3.9k其他未扫描2026年3月23日

安装

claude skill add --url github.com/openclaw/skills/tree/main/skills/bytesagain/benchmark

文档

Benchmark — Performance Benchmark Testing Tool

A comprehensive performance benchmarking skill for running CPU, memory, disk, and network tests. Supports comparison between runs, historical tracking, profiling, and stress testing. All results are stored in JSONL format.

Prerequisites

  • bash (v4+)
  • python3 (v3.6+)
  • Standard system utilities (dd, time, etc.)

Environment Variables

VariableRequiredDescription
BENCH_TYPENoBenchmark type: cpu, memory, disk, network (default: cpu)
BENCH_DURATIONNoDuration in seconds for stress tests (default: 10)
BENCH_THREADSNoNumber of threads for parallel tests (default: 1)
BENCH_IDNoSpecific benchmark ID for comparison/lookup
BENCH_TAGNoTag for organizing benchmark runs
BENCH_FORMATNoExport format: json, csv (default: json)

Data Storage

  • Results: ~/.benchmark/data.jsonl
  • Config: ~/.benchmark/config.json
  • Reports: ~/.benchmark/reports/

Commands

run

Execute a benchmark test of the specified type.

bash
BENCH_TYPE="cpu" BENCH_TAG="baseline" scripts/script.sh run

compare

Compare two benchmark runs side by side.

bash
BENCH_ID="bench_a" BENCH_ID2="bench_b" scripts/script.sh compare

history

Show benchmark run history with optional filtering.

bash
BENCH_TYPE="cpu" BENCH_TAG="baseline" scripts/script.sh history

report

Generate a detailed performance report.

bash
BENCH_ID="bench_abc123" scripts/script.sh report

profile

Run a detailed profiling session with breakdown.

bash
BENCH_TYPE="cpu" BENCH_DURATION="30" scripts/script.sh profile

stress

Run a sustained stress test.

bash
BENCH_TYPE="cpu" BENCH_DURATION="60" BENCH_THREADS="4" scripts/script.sh stress

config

View or update benchmark configuration.

bash
BENCH_KEY="default_duration" BENCH_VALUE="30" scripts/script.sh config

export

Export benchmark data in various formats.

bash
BENCH_FORMAT="csv" scripts/script.sh export

list

List all benchmark runs.

bash
scripts/script.sh list

status

Show benchmarking system status and summary.

bash
scripts/script.sh status

help

Display usage information.

bash
scripts/script.sh help

version

Display current version.

bash
scripts/script.sh version

Output Format

All commands output structured JSON to stdout:

json
{
  "status": "success",
  "command": "run",
  "data": {
    "id": "bench_20240101_120000_abc123",
    "type": "cpu",
    "score": 15234.5,
    "duration_ms": 10000,
    "metrics": {}
  }
}

Error Handling

Exit CodeMeaning
0Success
1General error
2Missing required parameter
3Benchmark not found

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

相关 Skills

Claude API

by anthropic

热门

Build apps with the Claude API or Anthropic SDK. TRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`/`claude_agent_sdk`, or user asks to use Claude API, Anthropic SDKs, or Agent SDK. DO NOT TRIGGER when: code imports `openai`/other AI SDK, general programming, or ML/data-science tasks.

其他
安全111.8k

Detect scam tokens on Solana before you trade. Checks ticker patterns, token age, and known scam mints. Read-only — no wallet signing required.

其他
未扫描3.9k

营收工作室

by amoldericksoans

A revenue-first solofounder studio that watches markets, finds monetizable pain, validates offers, ships narrow products, and compounds commercial memory across launches. Uses massive parallel agent orchestration with 8 layers: Signal Mesh, Extraction, Opportunity Graph, Cofounder Council, Revenue Lab, Build Studio, Launch Loop, and Portfolio Allocator.

其他
未扫描3.9k

相关资讯

本文提出了LIBERO-Para基准,用于系统评估视觉-语言-动作模型对指令复述的鲁棒性。研究发现,在七种VLA配置中,模型性能因复述下降22-52个百分点,主要源于对词汇表面匹配的依赖而非语义理解。为此,作者提出了PRIDE度量方法,通过语义和句法因素量化复述难度,为模型鲁棒性评估提供了更精细的工具。

深度·3月30日·26 分钟

pgvector 基准测试常因数据规模和维度不匹配而误导。成功团队会在实际工作负载上测试,精心选择并调优索引,并利用 SQL 过滤实现混合检索。将 pgvector 视为严肃的 Postgres 工作负载来管理,才能发挥其最大价值。

指南The New Stack·3月27日·5 分钟

这是一个让 LLM 编写代码控制单位对战的游戏,通过迭代改进代码进行比赛。Gemini 3.1 Pro 在比赛中表现突出,Claude Sonnet 4.6 意外超越了 Opus 4.6。

深度·3月23日·2 分钟

评论