Files
claude-gen/fission-python/template/docs/MIGRATIONS.md

583 lines
15 KiB
Markdown
Raw Normal View History

2026-03-18 20:21:56 +07:00
# Database Migrations
This guide covers managing database schema changes in Fission Python projects.
## Table of Contents
1. [Overview](#overview)
2. [Migration Files](#migration-files)
3. [Applying Migrations](#applying-migrations)
4. [Writing Migrations](#writing-migrations)
5. [Best Practices](#best-practices)
6. [Rollback Strategies](#rollback-strategies)
7. [Automation](#automation)
## Overview
Database schema changes should be managed through versioned migration scripts, not manual `CREATE TABLE` statements.
This template uses **plain SQL migration files** (`.sql`), which provide:
- Version control of schema changes
- Repeatable application to different environments
- Clear upgrade/downgrade paths
- Audit trail of schema evolution
## Migration Files
Place SQL migration scripts in the `migrates/` directory:
```
migrates/
├── 001_initial_schema.sql
├── 002_add_user_email.sql
├── 003_create_indexes.sql
└── ...
```
**Naming convention**:
- Prefix with sequential number (zero-padded for sorting)
- Descriptive name after underscore
- `.sql` extension
- Numbers should be unique and monotonically increasing
### Initial Schema Example
```sql
-- migrates/001_create_items_table.sql
-- Create items table
CREATE TABLE IF NOT EXISTS items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
status VARCHAR(50) DEFAULT 'active',
metadata JSONB,
created TIMESTAMPTZ DEFAULT NOW(),
modified TIMESTAMPTZ DEFAULT NOW()
);
-- Add indexes
CREATE INDEX idx_items_status ON items(status);
CREATE INDEX idx_items_created ON items(created);
-- Add comments
COMMENT ON TABLE items IS 'Stores item records';
COMMENT ON COLUMN items.status IS 'Item status: active, inactive, pending';
```
## Applying Migrations
### Manually
```bash
# Connect to database
psql -h localhost -U postgres -d mydb
# Run migration file
\i migrates/001_create_items_table.sql
# Run all migrations in order (bash script)
for file in $(ls migrates/*.sql | sort); do
echo "Applying $file..."
psql -h localhost -U postgres -d mydb -f "$file"
done
```
### Automatically from Python
Create a simple migration runner:
```python
# src/migrate.py (not part of function, standalone script)
import os
import psycopg2
from helpers import init_db_connection
def run_migrations():
conn = init_db_connection()
cursor = conn.cursor()
# Create migrations tracking table if not exists
cursor.execute("""
CREATE TABLE IF NOT EXISTS schema_migrations (
version INTEGER PRIMARY KEY,
name VARCHAR(255) NOT NULL,
applied_at TIMESTAMPTZ DEFAULT NOW()
)
""")
# Get already-applied migrations
cursor.execute("SELECT version FROM schema_migrations")
applied = {row[0] for row in cursor.fetchall()}
# Find migration files
migrates_dir = os.path.join(os.path.dirname(__file__), "..", "migrates")
files = sorted([
f for f in os.listdir(migrates_dir)
if f.endswith(".sql")
])
# Apply pending migrations
for filename in files:
# Extract version number
version = int(filename.split("_")[0])
if version in applied:
print(f"Skipping {filename} (already applied)")
continue
path = os.path.join(migrates_dir, filename)
print(f"Applying {filename}...")
with open(path, 'r') as f:
sql = f.read()
try:
cursor.execute(sql)
cursor.execute(
"INSERT INTO schema_migrations (version, name) VALUES (%s, %s)",
(version, filename)
)
conn.commit()
print(f" ✓ Applied {filename}")
except Exception as e:
conn.rollback()
print(f" ✗ Failed: {e}")
raise
conn.close()
print("All migrations applied")
if __name__ == "__main__":
run_migrations()
```
Run:
```bash
python src/migrate.py
```
### Using Migration Tools
For more advanced features (rollbacks, branching), consider:
- **[Alembic](https://alembic.sqlalchemy.org/)** - Database migration tool for SQLAlchemy (if using ORM)
- **[pg migrator](https://github.com/heroku/pg-migrator)** - Heroku's migration tool
- **[goose](https://github.com/pressly/goose)** - Multi-database migration tool (can use from Python)
- **[yoyo-migrations](https://github.com/gugulet-h/yoyo-migrations)** - Python-based migrations
## Writing Migrations
### Principles
1. **Idempotent** - Script should succeed if run multiple times
2. **Additive first** - Add columns/tables before removing/dropping
3. **Backward compatible** - New schema should work with old code
4. **Atomic** - One logical change per migration file
5. **Test locally** - Apply to test database before production
### Common Operations
#### Create Table
```sql
CREATE TABLE IF NOT EXISTS orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
total DECIMAL(10,2) NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Add foreign key
ALTER TABLE orders
ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE CASCADE;
-- Index for performance
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);
```
#### Add Column
```sql
-- Add nullable column (safe, backward compatible)
ALTER TABLE orders
ADD COLUMN shipping_address JSONB;
-- Add column with default (be careful with large tables!)
-- This rewrites entire table - use cautiously
ALTER TABLE orders
ADD COLUMN tax_amount DECIMAL(10,2) DEFAULT 0.00;
```
#### Rename Column
```sql
-- PostgreSQL 9.2+ supports RENAME COLUMN
ALTER TABLE orders
RENAME COLUMN total TO order_total;
```
#### Modify Column Type
```sql
-- Change VARCHAR length
ALTER TABLE users
ALTER COLUMN email TYPE VARCHAR(320);
-- Convert to different type (use USING clause)
ALTER TABLE orders
ALTER COLUMN status TYPE VARCHAR(100)
USING status::VARCHAR(100);
```
#### Create Index
```sql
-- Simple index
CREATE INDEX idx_users_email ON users(email);
-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
-- Partial index (only active users)
CREATE INDEX idx_users_active ON users(id)
WHERE status = 'active';
-- Multi-column index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
```
#### Drop Column/Table
```sql
-- First, ensure no one is using it
-- Consider using SET DEFAULT then dropping in subsequent migration
-- Drop column
ALTER TABLE orders
DROP COLUMN IF EXISTS old_column;
-- Drop table (dangerous!)
DROP TABLE IF EXISTS old_logs;
```
### Data Migrations
Sometimes you need to transform data:
```sql
-- Backfill new column from existing data
UPDATE orders
SET shipping_address = jsonb_build_object(
'street', address_street,
'city', address_city,
'zip', address_zip
)
WHERE shipping_address IS NULL;
-- Migrate enum values
UPDATE products
SET status = 'active' WHERE status = 'ACTIVE';
-- Clean up duplicates
WITH duplicates AS (
SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM duplicates WHERE rn > 1);
```
### Transactional Migrations
Wrap critical migrations in transactions:
```sql
BEGIN;
-- Multiple related operations
ALTER TABLE orders ADD COLUMN shipping_id UUID;
UPDATE orders SET shipping_id = uuid_generate_v4() WHERE shipping_id IS NULL;
ALTER TABLE orders ALTER COLUMN shipping_id SET NOT NULL;
COMMIT;
```
**Note**: DDL statements in PostgreSQL auto-commit, so `BEGIN`/`COMMIT` may not work as expected for schema changes. For complex multi-step changes, consider using advisory locks or deployment coordination.
## Best Practices
### ✅ Do's
1. **Test migrations on copy of production database** before applying to prod
2. **Keep migrations small** - One logical change per file
3. **Write data migrations as separate files** from schema migrations
4. **Use `IF NOT EXISTS` and `IF EXISTS`** to make migrations idempotent
5. **Never drop columns/tables in the same migration you add them** - Separate to allow rollback
6. **Document why** - Add comments explaining the purpose
7. **Consider indexes** - Add indexes for frequently queried columns in same migration as table creation
8. **Use UUIDs** for primary keys (`gen_random_uuid()` in PostgreSQL 13+)
9. **Add `created_at` and `updated_at` timestamps** to all tables
10. **Version numbers must be unique and sequential**
### ❌ Don'ts
1. **Don't modify already-applied migrations** - They're part of history
2. **Don't skip version numbers** - Creates gaps but not critical
3. **Don't use destructive operations without backup** - `DROP COLUMN`, `DROP TABLE`
4. **Don't run long-running migrations during peak hours** - Use low-traffic windows
5. **Don't add NOT NULL without default** on non-empty tables - Will fail due to existing NULL rows
6. **Don't assume order of execution** - Always number sequentially
7. **Don't mix unrelated changes** in one migration file
### Zero-Downtime Migrations
#### Adding Column
```sql
-- Step 1: Add column as nullable or with default (fast)
ALTER TABLE orders ADD COLUMN status VARCHAR(50);
-- Step 2: Deploy code that writes to new column
-- Your application updates to populate status
-- Step 3: Backfill existing rows (if needed)
UPDATE orders SET status = 'completed' WHERE status IS NULL AND shipped_at IS NOT NULL;
-- Step 4: Make column NOT NULL (if needed) - only after all rows have values
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
```
#### Renaming Column
```sql
-- Step 1: Add new column
ALTER TABLE orders ADD COLUMN order_status VARCHAR(50);
-- Step 2: Deploy code writing to both old and new columns (dual-write)
-- Step 3: Backfill data
UPDATE orders SET order_status = status;
-- Step 4: Deploy code reading from new column, stop writing to old
-- Step 5: Drop old column (in separate migration)
ALTER TABLE orders DROP COLUMN status;
```
## Rollback Strategies
### Manual Rollback
For each migration, you may want to write a corresponding "down" migration:
```sql
-- 002_add_user_email.sql (UP)
ALTER TABLE users ADD COLUMN email VARCHAR(320);
-- 002_add_user_email_rollback.sql (DOWN)
ALTER TABLE users DROP COLUMN IF EXISTS email;
```
Store rollback scripts alongside migrations or in separate `rollbacks/` directory.
### Point-in-Time Recovery
**Best strategy**: Restore database from backup to point before bad migration, then re-apply good migrations.
```bash
# Restore from PITR backup (if using WAL archiving)
pg_restore -h localhost -U postgres -d mydb --point-in-time="2025-03-18 10:30:00"
# Re-run migrations up to good version
python src/migrate.py # But this applies all, so need selective
```
### Selective Rollback Script
```python
# rollback.py
import sys
from helpers import init_db_connection
def rollback(to_version: int):
conn = init_db_connection()
cursor = conn.cursor()
# Find migrations after target version
cursor.execute("""
SELECT version, name
FROM schema_migrations
WHERE version > %s
ORDER BY version DESC
""", (to_version,))
migrations = cursor.fetchall()
for version, name in migrations:
rollback_file = f"rollbacks/{version:03d}_{name.split('_', 1)[1]}.sql"
print(f"Rolling back {name} using {rollback_file}...")
with open(rollback_file, 'r') as f:
sql = f.read()
cursor.execute(sql)
cursor.execute("DELETE FROM schema_migrations WHERE version = %s", (version,))
conn.commit()
print(f" Rolled back {name}")
conn.close()
print(f"Rolled back to version {to_version}")
if __name__ == "__main__":
target = int(sys.argv[1])
rollback(target)
```
## Automation
### CI/CD Integration
In your deployment pipeline:
```bash
# Before deploying new code
python src/migrate.py
# If migrations fail, abort deployment
if [ $? -ne 0 ]; then
echo "Migrations failed, aborting deployment"
exit 1
fi
# Deploy new code
fission deploy
```
### Pre-deployment Hooks
Use Fission hooks to run migrations automatically:
```json
{
"hooks": {
"function_pre_deploy": [
{
"type": "http",
"url": "http://migration-service/migrate",
"timeout": 300000
}
]
}
}
```
Or simpler: run migration as part of `build.sh`:
```bash
#!/bin/sh
# src/build.sh
# Install dependencies
pip3 install -r requirements.txt -t .
# Run migrations against test DB (or do nothing, migrations are separate)
# python ../migrate.py
# Package up
cp -r . ${DEPLOY_PKG}
```
### Database Change Management Tools
Consider specialized tools for larger teams:
- **[Flyway](https://flywaydb.org/)** - Java-based, supports repeatable migrations
- **[Liquibase](https://www.liquibase.org/)** - XML/YAML/JSON migrations
- **[Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate)** - If using Prisma ORM
- **[Alembic](https://alembic.sqlalchemy.org/)** - Python, SQLAlchemy-specific
## Example Workflow
1. **Create migration**:
```bash
touch migrates/004_add_orders_table.sql
```
2. **Write SQL**:
```sql
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
total DECIMAL(10,2) NOT NULL,
status VARCHAR(50) DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
```
3. **Test locally**:
```bash
createdb test_migration
psql test_migration -f migrates/004_add_orders_table.sql
```
4. **Commit migration file**:
```bash
git add migrates/004_add_orders_table.sql
git commit -m "Add orders table"
```
5. **Apply to staging**:
```bash
# Update dev-deployment.json if new env vars needed
fission deploy --dev
python src/migrate.py
```
6. **Apply to production**:
```bash
# Maintenance window or blue-green deployment
fission deploy
python src/migrate.py
```
## Troubleshooting
### Migration Fails
Check error message:
- **syntax error**: Validate SQL with `psql -c "SQL"` manually
- **duplicate column**: Migration already applied, check `schema_migrations`
- **permission denied**: DB user lacks ALTER/CREATE privileges
- **lock timeout**: Another migration running, wait or kill process
### Migration Already Applied But Failed
If migration was recorded in `schema_migrations` but failed mid-way:
1. Manually revert partial changes or fix broken state
2. Delete row from `schema_migrations`: `DELETE FROM schema_migrations WHERE version = 4;`
3. Re-run migration
### Long-Running Migration
Large table alterations can lock rows and cause downtime:
- Run during low-traffic period
- Use `CONCURRENTLY` for index creation (PostgreSQL):
```sql
CREATE INDEX CONCURRENTLY idx_orders_created ON orders(created_at);
```
- For adding NOT NULL, populate values first with UPDATE, then add constraint
- Consider using `pg_repack` for online table reorganization
## Summary
- Store migrations in `migrates/` directory, numbered sequentially
- Use `init_db_connection()` to run migrations programmatically
- Test migrations on staging database before production
- Keep migrations backward compatible when possible
- Have a rollback plan (backups, down scripts)
- Integrate migrations into CI/CD pipeline