S
SkillNav

Performance Profiler

Claude

by alirezarezvani

Tier: POWERFUL

安装

安装命令

git clone https://github.com/alirezarezvani/claude-skills/tree/main/engineering/performance-profiler

文档

Tier: POWERFUL
Category: Engineering
Domain: Performance Engineering


Overview

Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU, memory, and I/O bottlenecks; generates flamegraphs; analyzes bundle sizes; optimizes database queries; detects memory leaks; and runs load tests with k6 and Artillery. Always measures before and after.

Core Capabilities

  • CPU profiling — flamegraphs for Node.js, py-spy for Python, pprof for Go
  • Memory profiling — heap snapshots, leak detection, GC pressure
  • Bundle analysis — webpack-bundle-analyzer, Next.js bundle analyzer
  • Database optimization — EXPLAIN ANALYZE, slow query log, N+1 detection
  • Load testing — k6 scripts, Artillery scenarios, ramp-up patterns
  • Before/after measurement — establish baseline, profile, optimize, verify

When to Use

  • App is slow and you don't know where the bottleneck is
  • P99 latency exceeds SLA before a release
  • Memory usage grows over time (suspected leak)
  • Bundle size increased after adding dependencies
  • Preparing for a traffic spike (load test before launch)
  • Database queries taking >100ms

Golden Rule: Measure First

bash
# Establish baseline BEFORE any optimization
# Record: P50, P95, P99 latency | RPS | error rate | memory usage

# Wrong: "I think the N+1 query is slow, let me fix it"
# Right: Profile → confirm bottleneck → fix → measure again → verify improvement

Node.js Profiling

CPU Flamegraph

bash
# Method 1: clinic.js (best for development)
npm install -g clinic

# CPU flamegraph
clinic flame -- node dist/server.js

# Heap profiler
clinic heapprofiler -- node dist/server.js

# Bubble chart (event loop blocking)
clinic bubbles -- node dist/server.js

# Load with autocannon while profiling
autocannon -c 50 -d 30 http://localhost:3000/api/tasks &
clinic flame -- node dist/server.js
bash
# Method 2: Node.js built-in profiler
node --prof dist/server.js
# After running some load:
node --prof-process isolate-*.log | head -100
bash
# Method 3: V8 CPU profiler via inspector
node --inspect dist/server.js
# Open Chrome DevTools → Performance → Record

Heap Snapshot / Memory Leak Detection

javascript
// Add to your server for on-demand heap snapshots
import v8 from 'v8'
import fs from 'fs'

// Endpoint: POST /debug/heap-snapshot (protect with auth!)
app.post('/debug/heap-snapshot', (req, res) => {
  const filename = `heap-${Date.now()}.heapsnapshot`
  const snapshot = v8.writeHeapSnapshot(filename)
  res.json({ snapshot })
})
bash
# Take snapshots over time and compare in Chrome DevTools
curl -X POST http://localhost:3000/debug/heap-snapshot
# Wait 5 minutes of load
curl -X POST http://localhost:3000/debug/heap-snapshot
# Open both snapshots in Chrome → Memory → Compare

Detect Event Loop Blocking

javascript
// Add blocked-at to detect synchronous blocking
import blocked from 'blocked-at'

blocked((time, stack) => {
  console.warn(`Event loop blocked for ${time}ms`)
  console.warn(stack.join('\n'))
}, { threshold: 100 }) // Alert if blocked > 100ms

Node.js Memory Profiling Script

javascript
// scripts/memory-profile.mjs
// Run: node --experimental-vm-modules scripts/memory-profile.mjs

import { createRequire } from 'module'
const require = createRequire(import.meta.url)

function formatBytes(bytes) {
  return (bytes / 1024 / 1024).toFixed(2) + ' MB'
}

function measureMemory(label) {
  const mem = process.memoryUsage()
  console.log(`\n[${label}]`)
  console.log(`  RSS:       ${formatBytes(mem.rss)}`)
  console.log(`  Heap Used: ${formatBytes(mem.heapUsed)}`)
  console.log(`  Heap Total:${formatBytes(mem.heapTotal)}`)
  console.log(`  External:  ${formatBytes(mem.external)}`)
  return mem
}

const baseline = measureMemory('Baseline')

// Simulate your operation
for (let i = 0; i < 1000; i++) {
  // Replace with your actual operation
  const result = await someOperation()
}

const after = measureMemory('After 1000 operations')

console.log(`\n[Delta]`)
console.log(`  Heap Used: +${formatBytes(after.heapUsed - baseline.heapUsed)}`)

// If heap keeps growing across GC cycles, you have a leak
global.gc?.() // Run with --expose-gc flag
const afterGC = measureMemory('After GC')
if (afterGC.heapUsed > baseline.heapUsed * 1.1) {
  console.warn('⚠️  Possible memory leak detected (>10% growth after GC)')
}

Python Profiling

CPU Profiling with py-spy

bash
# Install
pip install py-spy

# Profile a running process (no code changes needed)
py-spy top --pid $(pgrep -f "uvicorn")

# Generate flamegraph SVG
py-spy record -o flamegraph.svg --pid $(pgrep -f "uvicorn") --duration 30

# Profile from the start
py-spy record -o flamegraph.svg -- python -m uvicorn app.main:app

# Open flamegraph.svg in browser — look for wide bars = hot code paths

cProfile for function-level profiling

python
# scripts/profile_endpoint.py
import cProfile
import pstats
import io
from app.services.task_service import TaskService

def run():
    service = TaskService()
    for _ in range(100):
        service.list_tasks(user_id="user_1", page=1, limit=20)

profiler = cProfile.Profile()
profiler.enable()
run()
profiler.disable()

# Print top 20 functions by cumulative time
stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats('cumulative')
stats.print_stats(20)
print(stream.getvalue())

Memory profiling with memory_profiler

python
# pip install memory-profiler
from memory_profiler import profile

@profile
def my_function():
    # Function to profile
    data = load_large_dataset()
    result = process(data)
    return result
bash
# Run with line-by-line memory tracking
python -m memory_profiler scripts/profile_function.py

# Output:
# Line #    Mem usage    Increment   Line Contents
# ================================================
#     10   45.3 MiB   45.3 MiB   def my_function():
#     11   78.1 MiB   32.8 MiB       data = load_large_dataset()
#     12  156.2 MiB   78.1 MiB       result = process(data)

Go Profiling with pprof

go
// main.go — add pprof endpoints
import _ "net/http/pprof"
import "net/http"

func main() {
    // pprof endpoints at /debug/pprof/
    go func() {
        log.Println(http.ListenAndServe(":6060", nil))
    }()
    // ... rest of your app
}
bash
# CPU profile (30s)
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

# Memory profile
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap

# Goroutine leak detection
curl http://localhost:6060/debug/pprof/goroutine?debug=1

# In pprof UI: "Flame Graph" view → find the tallest bars

Bundle Size Analysis

Next.js Bundle Analyzer

bash
# Install
pnpm add -D @next/bundle-analyzer

# next.config.js
const withBundleAnalyzer = require('@next/bundle-analyzer')({
  enabled: process.env.ANALYZE === 'true',
})
module.exports = withBundleAnalyzer({})

# Run analyzer
ANALYZE=true pnpm build
# Opens browser with treemap of bundle

What to look for

bash
# Find the largest chunks
pnpm build 2>&1 | grep -E "^\s+(λ|○|●)" | sort -k4 -rh | head -20

# Check if a specific package is too large
# Visit: https://bundlephobia.com/package/moment@2.29.4
# moment: 67.9kB gzipped → replace with date-fns (13.8kB) or dayjs (6.9kB)

# Find duplicate packages
pnpm dedupe --check

# Visualize what's in a chunk
npx source-map-explorer .next/static/chunks/*.js

Common bundle wins

typescript
// Before: import entire lodash
import _ from 'lodash'  // 71kB

// After: import only what you need
import debounce from 'lodash/debounce'  // 2kB

// Before: moment.js
import moment from 'moment'  // 67kB

// After: dayjs
import dayjs from 'dayjs'  // 7kB

// Before: static import (always in bundle)
import HeavyChart from '@/components/HeavyChart'

// After: dynamic import (loaded on demand)
const HeavyChart = dynamic(() => import('@/components/HeavyChart'), {
  loading: () => <Skeleton />,
})

Database Query Optimization

Find slow queries

sql
-- PostgreSQL: enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 20 slowest queries
SELECT
  round(mean_exec_time::numeric, 2) AS mean_ms,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  left(query, 80) AS query
FROM pg_stat_statements
WHERE calls > 10
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Reset stats
SELECT pg_stat_statements_reset();
bash
# MySQL slow query log
mysql -e "SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 0.1;"
tail -f /var/log/mysql/slow-query.log

EXPLAIN ANALYZE

sql
-- Always use EXPLAIN (ANALYZE, BUFFERS) for real timing
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT t.*, u.name as assignee_name
FROM tasks t
LEFT JOIN users u ON u.id = t.assignee_id
WHERE t.project_id = 'proj_123'
  AND t.deleted_at IS NULL
ORDER BY t.created_at DESC
LIMIT 20;

-- Look for:
-- Seq Scan on large table → needs index
-- Nested Loop with high rows → N+1, consider JOIN or batch
-- Sort → can index handle the sort?
-- Hash Join → fine for moderate sizes

Detect N+1 Queries

typescript
// Add query logging in dev
import { db } from './client'

// Drizzle: enable logging
const db = drizzle(pool, { logger: true })

// Or use a query counter middleware
let queryCount = 0
db.$on('query', () => queryCount++)

// In tests:
queryCount = 0
const tasks = await getTasksWithAssignees(projectId)
expect(queryCount).toBe(1)  // Fail if it's 21 (1 + 20 N+1s)
python
# Django: detect N+1 with django-silk or nplusone
from nplusone.ext.django.middleware import NPlusOneMiddleware
MIDDLEWARE = ['nplusone.ext.django.middleware.NPlusOneMiddleware']
NPLUSONE_RAISE = True  # Raise exception on N+1 in tests

Fix N+1 — Before/After

typescript
// Before: N+1 (1 query for tasks + N queries for assignees)
const tasks = await db.select().from(tasksTable)
for (const task of tasks) {
  task.assignee = await db.select().from(usersTable)
    .where(eq(usersTable.id, task.assigneeId))
    .then(r => r[0])
}

// After: 1 query with JOIN
const tasks = await db
  .select({
    id: tasksTable.id,
    title: tasksTable.title,
    assigneeName: usersTable.name,
    assigneeEmail: usersTable.email,
  })
  .from(tasksTable)
  .leftJoin(usersTable, eq(usersTable.id, tasksTable.assigneeId))
  .where(eq(tasksTable.projectId, projectId))

Load Testing with k6

javascript
// tests/load/api-load-test.js
import http from 'k6/http'
import { check, sleep } from 'k6'
import { Rate, Trend } from 'k6/metrics'

const errorRate = new Rate('errors')
const taskListDuration = new Trend('task_list_duration')

export const options = {
  stages: [
    { duration: '30s', target: 10 },   // Ramp up to 10 VUs
    { duration: '1m',  target: 50 },   // Ramp to 50 VUs
    { duration: '2m',  target: 50 },   // Sustain 50 VUs
    { duration: '30s', target: 100 },  // Spike to 100 VUs
    { duration: '1m',  target: 50 },   // Back to 50
    { duration: '30s', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],   // 95% of requests < 500ms
    http_req_duration: ['p(99)<1000'],  // 99% < 1s
    errors: ['rate<0.01'],              // Error rate < 1%
    task_list_duration: ['p(95)<200'],  // Task list specifically < 200ms
  },
}

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000'

export function setup() {
  // Get auth token once
  const loginRes = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
    email: 'loadtest@example.com',
    password: 'loadtest123',
  }), { headers: { 'Content-Type': 'application/json' } })
  
  return { token: loginRes.json('token') }
}

export default function(data) {
  const headers = {
    'Authorization': `Bearer ${data.token}`,
    'Content-Type': 'application/json',
  }
  
  // Scenario 1: List tasks
  const start = Date.now()
  const listRes = http.get(`${BASE_URL}/api/tasks?limit=20`, { headers })
  taskListDuration.add(Date.now() - start)
  
  check(listRes, {
    'list tasks: status 200': (r) => r.status === 200,
    'list tasks: has items': (r) => r.json('items') !== undefined,
  }) || errorRate.add(1)
  
  sleep(0.5)
  
  // Scenario 2: Create task
  const createRes = http.post(
    `${BASE_URL}/api/tasks`,
    JSON.stringify({ title: `Load test task ${Date.now()}`, priority: 'medium' }),
    { headers }
  )
  
  check(createRes, {
    'create task: status 201': (r) => r.status === 201,
  }) || errorRate.add(1)
  
  sleep(1)
}

export function teardown(data) {
  // Cleanup: delete load test tasks
}
bash
# Run load test
k6 run tests/load/api-load-test.js \
  --env BASE_URL=https://staging.myapp.com

# With Grafana output
k6 run --out influxdb=http://localhost:8086/k6 tests/load/api-load-test.js

Before/After Measurement Template

markdown
## Performance Optimization: [What You Fixed]

**Date:** 2026-03-01  
**Engineer:** @username  
**Ticket:** PROJ-123  

### Problem
[1-2 sentences: what was slow, how was it observed]

### Root Cause
[What the profiler revealed]

### Baseline (Before)
| Metric | Value |
|--------|-------|
| P50 latency | 480ms |
| P95 latency | 1,240ms |
| P99 latency | 3,100ms |
| RPS @ 50 VUs | 42 |
| Error rate | 0.8% |
| DB queries/req | 23 (N+1) |

Profiler evidence: [link to flamegraph or screenshot]

### Fix Applied
[What changed — code diff or description]

### After
| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| P50 latency | 480ms | 48ms | -90% |
| P95 latency | 1,240ms | 120ms | -90% |
| P99 latency | 3,100ms | 280ms | -91% |
| RPS @ 50 VUs | 42 | 380 | +804% |
| Error rate | 0.8% | 0% | -100% |
| DB queries/req | 23 | 1 | -96% |

### Verification
Load test run: [link to k6 output]

Optimization Checklist

Quick wins (check these first)

code
Database
□ Missing indexes on WHERE/ORDER BY columns
□ N+1 queries (check query count per request)
□ Loading all columns when only 2-3 needed (SELECT *)
□ No LIMIT on unbounded queries
□ Missing connection pool (creating new connection per request)

Node.js
□ Sync I/O (fs.readFileSync) in hot path
□ JSON.parse/stringify of large objects in hot loop
□ Missing caching for expensive computations
□ No compression (gzip/brotli) on responses
□ Dependencies loaded in request handler (move to module level)

Bundle
□ Moment.js → dayjs/date-fns
□ Lodash (full) → lodash/function imports
□ Static imports of heavy components → dynamic imports
□ Images not optimized / not using next/image
□ No code splitting on routes

API
□ No pagination on list endpoints
□ No response caching (Cache-Control headers)
□ Serial awaits that could be parallel (Promise.all)
□ Fetching related data in a loop instead of JOIN

Common Pitfalls

  • Optimizing without measuring — you'll optimize the wrong thing
  • Testing in development — profile against production-like data volumes
  • Ignoring P99 — P50 can look fine while P99 is catastrophic
  • Premature optimization — fix correctness first, then performance
  • Not re-measuring — always verify the fix actually improved things
  • Load testing production — use staging with production-size data

Best Practices

  1. Baseline first, always — record metrics before touching anything
  2. One change at a time — isolate the variable to confirm causation
  3. Profile with realistic data — 10 rows in dev, millions in prod — different bottlenecks
  4. Set performance budgetsp(95) < 200ms in CI thresholds with k6
  5. Monitor continuously — add Datadog/Prometheus metrics for key paths
  6. Cache invalidation strategy — cache aggressively, invalidate precisely
  7. Document the win — before/after in the PR description motivates the team

相关 Skills

Claude
未扫描

|

其他
daymade
Claude
未扫描

Safely package codebases with repomix by automatically detecting and removing hardcoded credentials before packing. Use when packaging code for distribution, creating reference packages, or when the user mentions security concerns about sharing code with repomix.

其他
daymade
Claude
未扫描

Coordinates dependency upgrades across all detected package managers

其他
levnikolaevich