数据库优化器
neon-postgres-egress-optimizer
by andrelandgraf
>-
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/andrelandgraf/neon-postgres-egress-optimizer文档
Postgres Egress Optimizer
Guide the user through diagnosing and fixing application-side query patterns that cause excessive data transfer (egress) from their Postgres database. Most high egress bills come from the application fetching more data than it uses.
Step 1: Diagnose
Identify which queries transfer the most data. The primary tool is the pg_stat_statements extension.
Check if pg_stat_statements is available
SELECT 1 FROM pg_stat_statements LIMIT 1;
If this errors, the extension needs to be created:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
On Neon, it is available by default but may need this CREATE EXTENSION step.
Handle empty stats
Stats are cleared when a Neon compute scales to zero and restarts. If the stats are empty or the compute recently woke up:
- Reset the stats to start a clean measurement window:
SELECT pg_stat_statements_reset(); - Let the application run under representative traffic for at least an hour.
- Return and run the diagnostic queries below.
If the user has stats from a production database, use those. If they have no access to production stats, proceed to Step 2 and analyze the codebase directly — code-level patterns are often sufficient to identify the worst offenders.
Diagnostic queries
Run these to identify the top egress contributors. Focus on queries that return many rows, return wide rows (JSONB, TEXT, BYTEA columns), or are called very frequently.
Queries returning the most total rows:
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;
Queries returning the most rows per execution (poorly scoped SELECTs, missing pagination):
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY avg_rows_per_call DESC
LIMIT 10;
Most frequently called queries (candidates for caching):
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;
Longest running queries (not a direct egress measure, but helps identify problem queries during a spike):
SELECT query, calls, rows AS total_rows,
round(total_exec_time::numeric, 2) AS total_exec_time_ms
FROM pg_stat_statements
WHERE calls > 0
ORDER BY total_exec_time DESC
LIMIT 10;
Interpret the results
Rank findings by estimated egress impact:
- High row count + wide rows = biggest egress. A query returning 1,000 rows where each row includes a 50KB JSONB column transfers ~50MB per call.
- Extreme call frequency on even small queries adds up. A query called 50,000 times/day returning 10 rows each = 500,000 rows/day.
- Cross-reference with the schema to identify which columns are wide. Look for JSONB, TEXT, BYTEA, and large VARCHAR columns.
Step 2: Analyze codebase
For each query identified in Step 1, or for each database query in the codebase if no stats are available, check:
- Does it select only the columns the response needs?
- Does it return a bounded number of rows (LIMIT/pagination)?
- Is it called frequently enough to benefit from caching?
- Does it fetch raw data that gets aggregated in application code?
- Does it use a JOIN that duplicates parent data across child rows?
Step 3: Fix
Apply the appropriate fix for each problem found. Below are the most common egress anti-patterns and how to fix them.
Unused columns (SELECT *)
Problem: The query fetches all columns but the application only uses a few. Large columns (JSONB blobs, TEXT fields) get transferred over the wire and discarded.
Before:
SELECT * FROM products;
After:
SELECT id, name, price, image_urls FROM products;
Missing pagination
Problem: A list endpoint returns all rows with no LIMIT. This is an unbounded egress risk — every new row in the table increases data transfer on every request. Flag this regardless of current table size.
This is easy to miss because the application may work fine with small datasets. But at scale, an unpaginated endpoint returning 10,000 rows with even moderate column widths can transfer hundreds of megabytes per day.
Before:
SELECT id, name, price FROM products;
After:
SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;
When adding pagination, check whether the consuming client already supports paginated responses. If not, pick sensible defaults and document the pagination parameters in the API.
High-frequency queries on static data
Problem: A query is called thousands of times per day but returns data that rarely changes. Every call transfers the same rows from the database. This pattern is only visible from pg_stat_statements — the code itself looks normal.
Look for queries with extremely high call counts relative to other queries. Common examples: configuration tables, category lists, feature flags, user role definitions.
Fix: Add a caching layer between the application and the database so it avoids hitting the database on every request.
Application-side aggregation
Problem: The application fetches all rows from a table and then computes aggregates (averages, counts, sums, groupings) in application code. The full dataset transfers over the wire even though the result is a small summary.
Fix: Push the aggregation into SQL.
Before: The application fetches entire tables and aggregates in code with loops or .reduce().
After:
SELECT p.category_id,
AVG(r.rating) AS avg_rating,
COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;
JOIN duplication
Problem: A JOIN between a wide parent table and a child table duplicates all parent columns across every child row. If a product has 200 reviews and the product row includes a 50KB JSONB column, the join sends that 50KB × 200 = ~10MB for a single request.
This is distinct from the SELECT * problem. Even if you select only needed columns, a JOIN still repeats the parent data for every child row. The fix is structural: avoid the join entirely.
Before:
SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;
After (two separate queries):
SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;
Two queries instead of one JOIN. The product data is fetched once. The reviews are fetched once. No duplication.
Step 4: Verify
After applying fixes:
- Run existing tests to confirm nothing broke.
- Check the responses — make sure the API still returns the same data shape. Column selection and pagination changes can break clients that depend on specific fields or full result sets.
- Measure the improvement — if pg_stat_statements data is available, reset it (
SELECT pg_stat_statements_reset();), let traffic run, then re-run the diagnostic queries to compare before and after.
Further reading
相关 Skills
Word文档
by anthropics
覆盖Word/.docx文档的创建、读取、编辑与重排,适合生成报告、备忘录、信函和模板,也能处理目录、页眉页脚、页码、图片替换、查找替换、修订批注及内容提取整理。
✎ 搞定 .docx 的创建、改写与精排版,目录、批量替换、批注修订和图片更新都能自动化,做正式文档尤其省心。
PDF处理
by anthropics
遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。
✎ PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。
PPT处理
by anthropics
处理 .pptx 全流程:创建演示文稿、提取和解析幻灯片内容、批量修改现有文件,支持模板套用、合并拆分、备注评论与版式调整。
✎ 涉及PPTX的创建、解析、修改到合并拆分都能一站搞定,连备注、模板和评论也能处理,做演示文稿特别省心。
相关 MCP 服务
文件系统
编辑精选by Anthropic
Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。
✎ 这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。
by wonderwhy-er
Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。
✎ 这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。
EdgarTools
编辑精选by dgunning
EdgarTools 是无需 API 密钥即可解析 SEC EDGAR 财报的开源 Python 库。
✎ 这个工具解决了金融数据获取的痛点——直接让 AI 读取结构化财报,比如让 Claude 分析苹果的 10-K 文件。适合量化分析师或金融开发者快速构建数据管道。但注意,它依赖 SEC 网站稳定性,高峰期可能延迟。