性能基准测试
benchmark
by BytesAgain
Run performance benchmarks and stress tests using Python profiling tools. Use when you need to measure, compare, or analyze system and application performance.
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/bytesagain/benchmark文档
Benchmark — Performance Benchmark Testing Tool
A comprehensive performance benchmarking skill for running CPU, memory, disk, and network tests. Supports comparison between runs, historical tracking, profiling, and stress testing. All results are stored in JSONL format.
Prerequisites
bash(v4+)python3(v3.6+)- Standard system utilities (
dd,time, etc.)
Environment Variables
| Variable | Required | Description |
|---|---|---|
BENCH_TYPE | No | Benchmark type: cpu, memory, disk, network (default: cpu) |
BENCH_DURATION | No | Duration in seconds for stress tests (default: 10) |
BENCH_THREADS | No | Number of threads for parallel tests (default: 1) |
BENCH_ID | No | Specific benchmark ID for comparison/lookup |
BENCH_TAG | No | Tag for organizing benchmark runs |
BENCH_FORMAT | No | Export format: json, csv (default: json) |
Data Storage
- Results:
~/.benchmark/data.jsonl - Config:
~/.benchmark/config.json - Reports:
~/.benchmark/reports/
Commands
run
Execute a benchmark test of the specified type.
BENCH_TYPE="cpu" BENCH_TAG="baseline" scripts/script.sh run
compare
Compare two benchmark runs side by side.
BENCH_ID="bench_a" BENCH_ID2="bench_b" scripts/script.sh compare
history
Show benchmark run history with optional filtering.
BENCH_TYPE="cpu" BENCH_TAG="baseline" scripts/script.sh history
report
Generate a detailed performance report.
BENCH_ID="bench_abc123" scripts/script.sh report
profile
Run a detailed profiling session with breakdown.
BENCH_TYPE="cpu" BENCH_DURATION="30" scripts/script.sh profile
stress
Run a sustained stress test.
BENCH_TYPE="cpu" BENCH_DURATION="60" BENCH_THREADS="4" scripts/script.sh stress
config
View or update benchmark configuration.
BENCH_KEY="default_duration" BENCH_VALUE="30" scripts/script.sh config
export
Export benchmark data in various formats.
BENCH_FORMAT="csv" scripts/script.sh export
list
List all benchmark runs.
scripts/script.sh list
status
Show benchmarking system status and summary.
scripts/script.sh status
help
Display usage information.
scripts/script.sh help
version
Display current version.
scripts/script.sh version
Output Format
All commands output structured JSON to stdout:
{
"status": "success",
"command": "run",
"data": {
"id": "bench_20240101_120000_abc123",
"type": "cpu",
"score": 15234.5,
"duration_ms": 10000,
"metrics": {}
}
}
Error Handling
| Exit Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error |
| 2 | Missing required parameter |
| 3 | Benchmark not found |
Powered by BytesAgain | bytesagain.com | hello@bytesagain.com
相关 Skills
Claude API
by anthropic
Build apps with the Claude API or Anthropic SDK. TRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`/`claude_agent_sdk`, or user asks to use Claude API, Anthropic SDKs, or Agent SDK. DO NOT TRIGGER when: code imports `openai`/other AI SDK, general programming, or ML/data-science tasks.
Solana防骗检测
by ammkode
Detect scam tokens on Solana before you trade. Checks ticker patterns, token age, and known scam mints. Read-only — no wallet signing required.
营收工作室
by amoldericksoans
A revenue-first solofounder studio that watches markets, finds monetizable pain, validates offers, ships narrow products, and compounds commercial memory across launches. Uses massive parallel agent orchestration with 8 layers: Signal Mesh, Extraction, Opportunity Graph, Cofounder Council, Revenue Lab, Build Studio, Launch Loop, and Portfolio Allocator.
相关资讯
本文提出了LIBERO-Para基准,用于系统评估视觉-语言-动作模型对指令复述的鲁棒性。研究发现,在七种VLA配置中,模型性能因复述下降22-52个百分点,主要源于对词汇表面匹配的依赖而非语义理解。为此,作者提出了PRIDE度量方法,通过语义和句法因素量化复述难度,为模型鲁棒性评估提供了更精细的工具。
pgvector 基准测试常因数据规模和维度不匹配而误导。成功团队会在实际工作负载上测试,精心选择并调优索引,并利用 SQL 过滤实现混合检索。将 pgvector 视为严肃的 Postgres 工作负载来管理,才能发挥其最大价值。
这是一个让 LLM 编写代码控制单位对战的游戏,通过迭代改进代码进行比赛。Gemini 3.1 Pro 在比赛中表现突出,Claude Sonnet 4.6 意外超越了 Opus 4.6。