数据库设计
Database Designer - POWERFUL Tier Skill
by alirezarezvani
聚焦数据库 Schema 设计与演进,自动检查规范化、数据类型、约束和索引问题,生成 ERD,并为零停机迁移、数据变更和回滚提供可执行方案。
专注数据库设计与数据建模,帮你快速理清表结构和关系,减少后期返工,SQL 落地也更顺手。
安装
claude skill add --url github.com/alirezarezvani/claude-skills/tree/main/engineering/database-designer文档
Overview
A comprehensive database design skill that provides expert-level analysis, optimization, and migration capabilities for modern database systems. This skill combines theoretical principles with practical tools to help architects and developers create scalable, performant, and maintainable database schemas.
Core Competencies
Schema Design & Analysis
- Normalization Analysis: Automated detection of normalization levels (1NF through BCNF)
- Denormalization Strategy: Smart recommendations for performance optimization
- Data Type Optimization: Identification of inappropriate types and size issues
- Constraint Analysis: Missing foreign keys, unique constraints, and null checks
- Naming Convention Validation: Consistent table and column naming patterns
- ERD Generation: Automatic Mermaid diagram creation from DDL
Index Optimization
- Index Gap Analysis: Identification of missing indexes on foreign keys and query patterns
- Composite Index Strategy: Optimal column ordering for multi-column indexes
- Index Redundancy Detection: Elimination of overlapping and unused indexes
- Performance Impact Modeling: Selectivity estimation and query cost analysis
- Index Type Selection: B-tree, hash, partial, covering, and specialized indexes
Migration Management
- Zero-Downtime Migrations: Expand-contract pattern implementation
- Schema Evolution: Safe column additions, deletions, and type changes
- Data Migration Scripts: Automated data transformation and validation
- Rollback Strategy: Complete reversal capabilities with validation
- Execution Planning: Ordered migration steps with dependency resolution
Database Design Principles
Normalization Forms
First Normal Form (1NF)
- Atomic Values: Each column contains indivisible values
- Unique Column Names: No duplicate column names within a table
- Uniform Data Types: Each column contains the same type of data
- Row Uniqueness: No duplicate rows in the table
Example Violation:
-- BAD: Multiple phone numbers in one column
CREATE TABLE contacts (
id INT PRIMARY KEY,
name VARCHAR(100),
phones VARCHAR(200) -- "123-456-7890, 098-765-4321"
);
-- GOOD: Separate table for phone numbers
CREATE TABLE contacts (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE contact_phones (
id INT PRIMARY KEY,
contact_id INT REFERENCES contacts(id),
phone_number VARCHAR(20),
phone_type VARCHAR(10)
);
Second Normal Form (2NF)
- 1NF Compliance: Must satisfy First Normal Form
- Full Functional Dependency: Non-key attributes depend on the entire primary key
- Partial Dependency Elimination: Remove attributes that depend on part of a composite key
Example Violation:
-- BAD: Student course table with partial dependencies
CREATE TABLE student_courses (
student_id INT,
course_id INT,
student_name VARCHAR(100), -- Depends only on student_id
course_name VARCHAR(100), -- Depends only on course_id
grade CHAR(1),
PRIMARY KEY (student_id, course_id)
);
-- GOOD: Separate tables eliminate partial dependencies
CREATE TABLE students (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE courses (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE enrollments (
student_id INT REFERENCES students(id),
course_id INT REFERENCES courses(id),
grade CHAR(1),
PRIMARY KEY (student_id, course_id)
);
Third Normal Form (3NF)
- 2NF Compliance: Must satisfy Second Normal Form
- Transitive Dependency Elimination: Non-key attributes should not depend on other non-key attributes
- Direct Dependency: Non-key attributes depend directly on the primary key
Example Violation:
-- BAD: Employee table with transitive dependency
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT,
department_name VARCHAR(100), -- Depends on department_id, not employee id
department_budget DECIMAL(10,2) -- Transitive dependency
);
-- GOOD: Separate department information
CREATE TABLE departments (
id INT PRIMARY KEY,
name VARCHAR(100),
budget DECIMAL(10,2)
);
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT REFERENCES departments(id)
);
Boyce-Codd Normal Form (BCNF)
- 3NF Compliance: Must satisfy Third Normal Form
- Determinant Key Rule: Every determinant must be a candidate key
- Stricter 3NF: Handles anomalies not covered by 3NF
Denormalization Strategies
When to Denormalize
- Read-Heavy Workloads: High query frequency with acceptable write trade-offs
- Performance Bottlenecks: Join operations causing significant latency
- Aggregation Needs: Frequent calculation of derived values
- Caching Requirements: Pre-computed results for common queries
Common Denormalization Patterns
Redundant Storage
-- Store calculated values to avoid expensive joins
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT REFERENCES customers(id),
customer_name VARCHAR(100), -- Denormalized from customers table
order_total DECIMAL(10,2), -- Denormalized calculation
created_at TIMESTAMP
);
Materialized Aggregates
-- Pre-computed summary tables
CREATE TABLE customer_statistics (
customer_id INT PRIMARY KEY,
total_orders INT,
lifetime_value DECIMAL(12,2),
last_order_date DATE,
updated_at TIMESTAMP
);
Index Optimization Strategies
B-Tree Indexes
- Default Choice: Best for range queries, sorting, and equality matches
- Column Order: Most selective columns first for composite indexes
- Prefix Matching: Supports leading column subset queries
- Maintenance Cost: Balanced tree structure with logarithmic operations
Hash Indexes
- Equality Queries: Optimal for exact match lookups
- Memory Efficiency: Constant-time access for single-value queries
- Range Limitations: Cannot support range or partial matches
- Use Cases: Primary keys, unique constraints, cache keys
Composite Indexes
-- Query pattern determines optimal column order
-- Query: WHERE status = 'active' AND created_date > '2023-01-01' ORDER BY priority DESC
CREATE INDEX idx_task_status_date_priority
ON tasks (status, created_date, priority DESC);
-- Query: WHERE user_id = 123 AND category IN ('A', 'B') AND date_field BETWEEN '...' AND '...'
CREATE INDEX idx_user_category_date
ON user_activities (user_id, category, date_field);
Covering Indexes
-- Include additional columns to avoid table lookups
CREATE INDEX idx_user_email_covering
ON users (email)
INCLUDE (first_name, last_name, status);
-- Query can be satisfied entirely from the index
-- SELECT first_name, last_name, status FROM users WHERE email = 'user@example.com';
Partial Indexes
-- Index only relevant subset of data
CREATE INDEX idx_active_users_email
ON users (email)
WHERE status = 'active';
-- Index for recent orders only
CREATE INDEX idx_recent_orders_customer
ON orders (customer_id, created_at)
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';
Query Analysis & Optimization
Query Patterns Recognition
- Equality Filters: Single-column B-tree indexes
- Range Queries: B-tree with proper column ordering
- Text Search: Full-text indexes or trigram indexes
- Join Operations: Foreign key indexes on both sides
- Sorting Requirements: Indexes matching ORDER BY clauses
Index Selection Algorithm
1. Identify WHERE clause columns
2. Determine most selective columns first
3. Consider JOIN conditions
4. Include ORDER BY columns if possible
5. Evaluate covering index opportunities
6. Check for existing overlapping indexes
Data Modeling Patterns
Star Schema (Data Warehousing)
-- Central fact table
CREATE TABLE sales_facts (
sale_id BIGINT PRIMARY KEY,
product_id INT REFERENCES products(id),
customer_id INT REFERENCES customers(id),
date_id INT REFERENCES date_dimension(id),
store_id INT REFERENCES stores(id),
quantity INT,
unit_price DECIMAL(8,2),
total_amount DECIMAL(10,2)
);
-- Dimension tables
CREATE TABLE date_dimension (
id INT PRIMARY KEY,
date_value DATE,
year INT,
quarter INT,
month INT,
day_of_week INT,
is_weekend BOOLEAN
);
Snowflake Schema
-- Normalized dimension tables
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(200),
category_id INT REFERENCES product_categories(id),
brand_id INT REFERENCES brands(id)
);
CREATE TABLE product_categories (
id INT PRIMARY KEY,
name VARCHAR(100),
parent_category_id INT REFERENCES product_categories(id)
);
Document Model (JSON Storage)
-- Flexible document storage with indexing
CREATE TABLE documents (
id UUID PRIMARY KEY,
document_type VARCHAR(50),
data JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Index on JSON properties
CREATE INDEX idx_documents_user_id
ON documents USING GIN ((data->>'user_id'));
CREATE INDEX idx_documents_status
ON documents ((data->>'status'))
WHERE document_type = 'order';
Graph Data Patterns
-- Adjacency list for hierarchical data
CREATE TABLE categories (
id INT PRIMARY KEY,
name VARCHAR(100),
parent_id INT REFERENCES categories(id),
level INT,
path VARCHAR(500) -- Materialized path: "/1/5/12/"
);
-- Many-to-many relationships
CREATE TABLE relationships (
id UUID PRIMARY KEY,
from_entity_id UUID,
to_entity_id UUID,
relationship_type VARCHAR(50),
created_at TIMESTAMP,
INDEX (from_entity_id, relationship_type),
INDEX (to_entity_id, relationship_type)
);
Migration Strategies
Zero-Downtime Migration (Expand-Contract Pattern)
Phase 1: Expand
-- Add new column without constraints
ALTER TABLE users ADD COLUMN new_email VARCHAR(255);
-- Backfill data in batches
UPDATE users SET new_email = email WHERE id BETWEEN 1 AND 1000;
-- Continue in batches...
-- Add constraints after backfill
ALTER TABLE users ADD CONSTRAINT users_new_email_unique UNIQUE (new_email);
ALTER TABLE users ALTER COLUMN new_email SET NOT NULL;
Phase 2: Contract
-- Update application to use new column
-- Deploy application changes
-- Verify new column is being used
-- Remove old column
ALTER TABLE users DROP COLUMN email;
-- Rename new column
ALTER TABLE users RENAME COLUMN new_email TO email;
Data Type Changes
-- Safe string to integer conversion
ALTER TABLE products ADD COLUMN sku_number INTEGER;
UPDATE products SET sku_number = CAST(sku AS INTEGER) WHERE sku ~ '^[0-9]+$';
-- Validate conversion success before dropping old column
Partitioning Strategies
Horizontal Partitioning (Sharding)
-- Range partitioning by date
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- Hash partitioning by user_id
CREATE TABLE user_data_0 PARTITION OF user_data
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_data_1 PARTITION OF user_data
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
Vertical Partitioning
-- Separate frequently accessed columns
CREATE TABLE users_core (
id INT PRIMARY KEY,
email VARCHAR(255),
status VARCHAR(20),
created_at TIMESTAMP
);
-- Less frequently accessed profile data
CREATE TABLE users_profile (
user_id INT PRIMARY KEY REFERENCES users_core(id),
bio TEXT,
preferences JSONB,
last_login TIMESTAMP
);
Connection Management
Connection Pooling
- Pool Size: CPU cores × 2 + effective spindle count
- Connection Lifetime: Rotate connections to prevent resource leaks
- Timeout Settings: Connection, idle, and query timeouts
- Health Checks: Regular connection validation
Read Replicas Strategy
-- Write queries to primary
INSERT INTO users (email, name) VALUES ('user@example.com', 'John Doe');
-- Read queries to replicas (with appropriate read preference)
SELECT * FROM users WHERE status = 'active'; -- Route to read replica
-- Consistent reads when required
SELECT * FROM users WHERE id = LAST_INSERT_ID(); -- Route to primary
Caching Layers
Cache-Aside Pattern
def get_user(user_id):
# Try cache first
user = cache.get(f"user:{user_id}")
if user is None:
# Cache miss - query database
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Store in cache
cache.set(f"user:{user_id}", user, ttl=3600)
return user
Write-Through Cache
- Consistency: Always keep cache and database in sync
- Write Latency: Higher due to dual writes
- Data Safety: No data loss on cache failures
Cache Invalidation Strategies
- TTL-Based: Time-based expiration
- Event-Driven: Invalidate on data changes
- Version-Based: Use version numbers for consistency
- Tag-Based: Group related cache entries
Database Selection Guide
SQL Databases
PostgreSQL
- Strengths: ACID compliance, complex queries, JSON support, extensibility
- Use Cases: OLTP applications, data warehousing, geospatial data
- Scale: Vertical scaling with read replicas
MySQL
- Strengths: Performance, replication, wide ecosystem support
- Use Cases: Web applications, content management, e-commerce
- Scale: Horizontal scaling through sharding
NoSQL Databases
Document Stores (MongoDB, CouchDB)
- Strengths: Flexible schema, horizontal scaling, developer productivity
- Use Cases: Content management, catalogs, user profiles
- Trade-offs: Eventual consistency, complex queries limitations
Key-Value Stores (Redis, DynamoDB)
- Strengths: High performance, simple model, excellent caching
- Use Cases: Session storage, real-time analytics, gaming leaderboards
- Trade-offs: Limited query capabilities, data modeling constraints
Column-Family (Cassandra, HBase)
- Strengths: Write-heavy workloads, linear scalability, fault tolerance
- Use Cases: Time-series data, IoT applications, messaging systems
- Trade-offs: Query flexibility, consistency model complexity
Graph Databases (Neo4j, Amazon Neptune)
- Strengths: Relationship queries, pattern matching, recommendation engines
- Use Cases: Social networks, fraud detection, knowledge graphs
- Trade-offs: Specialized use cases, learning curve
NewSQL Databases
Distributed SQL (CockroachDB, TiDB, Spanner)
- Strengths: SQL compatibility with horizontal scaling
- Use Cases: Global applications requiring ACID guarantees
- Trade-offs: Complexity, latency for distributed transactions
Tools & Scripts
Schema Analyzer
- Input: SQL DDL files, JSON schema definitions
- Analysis: Normalization compliance, constraint validation, naming conventions
- Output: Analysis report, Mermaid ERD, improvement recommendations
Index Optimizer
- Input: Schema definition, query patterns
- Analysis: Missing indexes, redundancy detection, selectivity estimation
- Output: Index recommendations, CREATE INDEX statements, performance projections
Migration Generator
- Input: Current and target schemas
- Analysis: Schema differences, dependency resolution, risk assessment
- Output: Migration scripts, rollback plans, validation queries
Best Practices
Schema Design
- Use meaningful names: Clear, consistent naming conventions
- Choose appropriate data types: Right-sized columns for storage efficiency
- Define proper constraints: Foreign keys, check constraints, unique indexes
- Consider future growth: Plan for scale from the beginning
- Document relationships: Clear foreign key relationships and business rules
Performance Optimization
- Index strategically: Cover common query patterns without over-indexing
- Monitor query performance: Regular analysis of slow queries
- Partition large tables: Improve query performance and maintenance
- Use appropriate isolation levels: Balance consistency with performance
- Implement connection pooling: Efficient resource utilization
Security Considerations
- Principle of least privilege: Grant minimal necessary permissions
- Encrypt sensitive data: At rest and in transit
- Audit access patterns: Monitor and log database access
- Validate inputs: Prevent SQL injection attacks
- Regular security updates: Keep database software current
Conclusion
Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution.
The included tools automate common analysis and optimization tasks, while the comprehensive guides provide the theoretical foundation for making sound architectural decisions. Whether building a new system or optimizing an existing one, these resources provide expert-level guidance for creating robust, scalable database solutions.
相关 Skills
资深数据工程师
by alirezarezvani
聚焦生产级数据工程,覆盖 ETL/ELT、批处理与流式管道、数据建模、Airflow/dbt/Spark 优化和数据质量治理,适合设计数据架构、搭建现代数据栈与排查性能问题。
✎ 复杂数据管道、ETL/ELT 和治理难题交给它,凭 Spark、Airflow、dbt 等现代数据栈经验,能更稳地搭起可扩展的数据基础设施。
技术栈评估
by alirezarezvani
对比框架、数据库和云服务,结合 5 年 TCO、安全风险、生态活力与迁移复杂度做量化评估,适合技术选型、栈升级和替换路线决策。
✎ 帮你系统比较技术栈优劣,不只看功能,还把TCO、安全性和生态健康度一起量化,选型和迁移决策更稳。
迁移架构师
by alirezarezvani
为数据库、API 与基础设施迁移制定分阶段零停机方案,提前校验兼容性与风险,生成回滚策略、验证关卡和时间线,适合复杂系统平滑切换。
✎ 做数据库与存储迁移时,用它统一梳理表结构和数据搬迁流程,架构视角更完整,复杂迁移也更稳。
相关 MCP 服务
SQLite 数据库
编辑精选by Anthropic
SQLite 是让 AI 直接查询本地数据库进行数据分析的 MCP 服务器。
✎ 这个服务器解决了 AI 无法直接访问 SQLite 数据库的问题,适合需要快速分析本地数据集的开发者。不过,作为参考实现,它可能缺乏生产级的安全特性,建议在受控环境中使用。
PostgreSQL 数据库
编辑精选by Anthropic
PostgreSQL 是让 Claude 直接查询和管理你的数据库的 MCP 服务器。
✎ 这个服务器解决了开发者需要手动编写 SQL 查询的痛点,特别适合数据分析师或后端开发者快速探索数据库结构。不过,由于是参考实现,生产环境使用前务必评估安全风险,别指望它能处理复杂事务。
Firecrawl 智能爬虫
编辑精选by Firecrawl
Firecrawl 是让 AI 直接抓取网页并提取结构化数据的 MCP 服务器。
✎ 它解决了手动写爬虫的麻烦,让 Claude 能直接访问动态网页内容。最适合需要实时数据的研究者或开发者,比如监控竞品价格或抓取新闻。但要注意,它依赖第三方 API,可能涉及隐私和成本问题。