ducnv/claude-gen

Fork 0

Files

Duc Nguyen 29667cd92f ref: up

2026-03-18 20:21:56 +07:00

15 KiB

Raw Blame History

Database Migrations

This guide covers managing database schema changes in Fission Python projects.

Overview
Migration Files
Applying Migrations
Writing Migrations
Best Practices
Rollback Strategies
Automation

Overview

Database schema changes should be managed through versioned migration scripts, not manual CREATE TABLE statements.

This template uses plain SQL migration files (.sql), which provide:

Version control of schema changes
Repeatable application to different environments
Clear upgrade/downgrade paths
Audit trail of schema evolution

Migration Files

Place SQL migration scripts in the migrates/ directory:

migrates/
├── 001_initial_schema.sql
├── 002_add_user_email.sql
├── 003_create_indexes.sql
└── ...

Naming convention:

Prefix with sequential number (zero-padded for sorting)
Descriptive name after underscore
.sql extension
Numbers should be unique and monotonically increasing

Initial Schema Example

-- migrates/001_create_items_table.sql
-- Create items table
CREATE TABLE IF NOT EXISTS items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    status VARCHAR(50) DEFAULT 'active',
    metadata JSONB,
    created TIMESTAMPTZ DEFAULT NOW(),
    modified TIMESTAMPTZ DEFAULT NOW()
);

-- Add indexes
CREATE INDEX idx_items_status ON items(status);
CREATE INDEX idx_items_created ON items(created);

-- Add comments
COMMENT ON TABLE items IS 'Stores item records';
COMMENT ON COLUMN items.status IS 'Item status: active, inactive, pending';

Applying Migrations

Manually

# Connect to database
psql -h localhost -U postgres -d mydb

# Run migration file
\i migrates/001_create_items_table.sql

# Run all migrations in order (bash script)
for file in $(ls migrates/*.sql | sort); do
    echo "Applying $file..."
    psql -h localhost -U postgres -d mydb -f "$file"
done

Automatically from Python

Create a simple migration runner:

# src/migrate.py (not part of function, standalone script)
import os
import psycopg2
from helpers import init_db_connection

def run_migrations():
    conn = init_db_connection()
    cursor = conn.cursor()

    # Create migrations tracking table if not exists
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS schema_migrations (
            version INTEGER PRIMARY KEY,
            name VARCHAR(255) NOT NULL,
            applied_at TIMESTAMPTZ DEFAULT NOW()
        )
    """)

    # Get already-applied migrations
    cursor.execute("SELECT version FROM schema_migrations")
    applied = {row[0] for row in cursor.fetchall()}

    # Find migration files
    migrates_dir = os.path.join(os.path.dirname(__file__), "..", "migrates")
    files = sorted([
        f for f in os.listdir(migrates_dir)
        if f.endswith(".sql")
    ])

    # Apply pending migrations
    for filename in files:
        # Extract version number
        version = int(filename.split("_")[0])
        if version in applied:
            print(f"Skipping {filename} (already applied)")
            continue

        path = os.path.join(migrates_dir, filename)
        print(f"Applying {filename}...")
        with open(path, 'r') as f:
            sql = f.read()

        try:
            cursor.execute(sql)
            cursor.execute(
                "INSERT INTO schema_migrations (version, name) VALUES (%s, %s)",
                (version, filename)
            )
            conn.commit()
            print(f"  ✓ Applied {filename}")
        except Exception as e:
            conn.rollback()
            print(f"  ✗ Failed: {e}")
            raise

    conn.close()
    print("All migrations applied")

if __name__ == "__main__":
    run_migrations()

Run:

python src/migrate.py

Using Migration Tools

For more advanced features (rollbacks, branching), consider:

Alembic - Database migration tool for SQLAlchemy (if using ORM)
pg migrator - Heroku's migration tool
goose - Multi-database migration tool (can use from Python)
yoyo-migrations - Python-based migrations

Writing Migrations

Principles

Idempotent - Script should succeed if run multiple times
Additive first - Add columns/tables before removing/dropping
Backward compatible - New schema should work with old code
Atomic - One logical change per migration file
Test locally - Apply to test database before production

Common Operations

Create Table

CREATE TABLE IF NOT EXISTS orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    total DECIMAL(10,2) NOT NULL,
    status VARCHAR(50) NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Add foreign key
ALTER TABLE orders
ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE CASCADE;

-- Index for performance
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);

Add Column

-- Add nullable column (safe, backward compatible)
ALTER TABLE orders
ADD COLUMN shipping_address JSONB;

-- Add column with default (be careful with large tables!)
-- This rewrites entire table - use cautiously
ALTER TABLE orders
ADD COLUMN tax_amount DECIMAL(10,2) DEFAULT 0.00;

Rename Column

-- PostgreSQL 9.2+ supports RENAME COLUMN
ALTER TABLE orders
RENAME COLUMN total TO order_total;

Modify Column Type

-- Change VARCHAR length
ALTER TABLE users
ALTER COLUMN email TYPE VARCHAR(320);

-- Convert to different type (use USING clause)
ALTER TABLE orders
ALTER COLUMN status TYPE VARCHAR(100)
USING status::VARCHAR(100);

Create Index

-- Simple index
CREATE INDEX idx_users_email ON users(email);

-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Partial index (only active users)
CREATE INDEX idx_users_active ON users(id)
WHERE status = 'active';

-- Multi-column index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

Drop Column/Table

-- First, ensure no one is using it
-- Consider using SET DEFAULT then dropping in subsequent migration

-- Drop column
ALTER TABLE orders
DROP COLUMN IF EXISTS old_column;

-- Drop table (dangerous!)
DROP TABLE IF EXISTS old_logs;

Data Migrations

Sometimes you need to transform data:

-- Backfill new column from existing data
UPDATE orders
SET shipping_address = jsonb_build_object(
    'street', address_street,
    'city', address_city,
    'zip', address_zip
)
WHERE shipping_address IS NULL;

-- Migrate enum values
UPDATE products
SET status = 'active' WHERE status = 'ACTIVE';

-- Clean up duplicates
WITH duplicates AS (
    SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
    FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM duplicates WHERE rn > 1);

Transactional Migrations

Wrap critical migrations in transactions:

BEGIN;

-- Multiple related operations
ALTER TABLE orders ADD COLUMN shipping_id UUID;
UPDATE orders SET shipping_id = uuid_generate_v4() WHERE shipping_id IS NULL;
ALTER TABLE orders ALTER COLUMN shipping_id SET NOT NULL;

COMMIT;

Note: DDL statements in PostgreSQL auto-commit, so BEGIN/COMMIT may not work as expected for schema changes. For complex multi-step changes, consider using advisory locks or deployment coordination.

Best Practices

✅ Do's

Test migrations on copy of production database before applying to prod
Keep migrations small - One logical change per file
Write data migrations as separate files from schema migrations
Use IF NOT EXISTS and IF EXISTS to make migrations idempotent
Never drop columns/tables in the same migration you add them - Separate to allow rollback
Document why - Add comments explaining the purpose
Consider indexes - Add indexes for frequently queried columns in same migration as table creation
Use UUIDs for primary keys (gen_random_uuid() in PostgreSQL 13+)
Add created_at and updated_at timestamps to all tables
Version numbers must be unique and sequential

❌ Don'ts

Don't modify already-applied migrations - They're part of history
Don't skip version numbers - Creates gaps but not critical
Don't use destructive operations without backup - DROP COLUMN, DROP TABLE
Don't run long-running migrations during peak hours - Use low-traffic windows
Don't add NOT NULL without default on non-empty tables - Will fail due to existing NULL rows
Don't assume order of execution - Always number sequentially
Don't mix unrelated changes in one migration file

Zero-Downtime Migrations

Adding Column

-- Step 1: Add column as nullable or with default (fast)
ALTER TABLE orders ADD COLUMN status VARCHAR(50);

-- Step 2: Deploy code that writes to new column
-- Your application updates to populate status

-- Step 3: Backfill existing rows (if needed)
UPDATE orders SET status = 'completed' WHERE status IS NULL AND shipped_at IS NOT NULL;

-- Step 4: Make column NOT NULL (if needed) - only after all rows have values
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;

Renaming Column

-- Step 1: Add new column
ALTER TABLE orders ADD COLUMN order_status VARCHAR(50);

-- Step 2: Deploy code writing to both old and new columns (dual-write)

-- Step 3: Backfill data
UPDATE orders SET order_status = status;

-- Step 4: Deploy code reading from new column, stop writing to old

-- Step 5: Drop old column (in separate migration)
ALTER TABLE orders DROP COLUMN status;

Rollback Strategies

Manual Rollback

For each migration, you may want to write a corresponding "down" migration:

-- 002_add_user_email.sql (UP)
ALTER TABLE users ADD COLUMN email VARCHAR(320);

-- 002_add_user_email_rollback.sql (DOWN)
ALTER TABLE users DROP COLUMN IF EXISTS email;

Store rollback scripts alongside migrations or in separate rollbacks/ directory.

Point-in-Time Recovery

Best strategy: Restore database from backup to point before bad migration, then re-apply good migrations.

# Restore from PITR backup (if using WAL archiving)
pg_restore -h localhost -U postgres -d mydb --point-in-time="2025-03-18 10:30:00"

# Re-run migrations up to good version
python src/migrate.py  # But this applies all, so need selective

Selective Rollback Script

# rollback.py
import sys
from helpers import init_db_connection

def rollback(to_version: int):
    conn = init_db_connection()
    cursor = conn.cursor()

    # Find migrations after target version
    cursor.execute("""
        SELECT version, name
        FROM schema_migrations
        WHERE version > %s
        ORDER BY version DESC
    """, (to_version,))

    migrations = cursor.fetchall()

    for version, name in migrations:
        rollback_file = f"rollbacks/{version:03d}_{name.split('_', 1)[1]}.sql"
        print(f"Rolling back {name} using {rollback_file}...")
        with open(rollback_file, 'r') as f:
            sql = f.read()
        cursor.execute(sql)
        cursor.execute("DELETE FROM schema_migrations WHERE version = %s", (version,))
        conn.commit()
        print(f"  Rolled back {name}")

    conn.close()
    print(f"Rolled back to version {to_version}")

if __name__ == "__main__":
    target = int(sys.argv[1])
    rollback(target)

Automation

CI/CD Integration

In your deployment pipeline:

# Before deploying new code
python src/migrate.py

# If migrations fail, abort deployment
if [ $? -ne 0 ]; then
    echo "Migrations failed, aborting deployment"
    exit 1
fi

# Deploy new code
fission deploy

Pre-deployment Hooks

Use Fission hooks to run migrations automatically:

{
  "hooks": {
    "function_pre_deploy": [
      {
        "type": "http",
        "url": "http://migration-service/migrate",
        "timeout": 300000
      }
    ]
  }
}

Or simpler: run migration as part of build.sh:

#!/bin/sh
# src/build.sh

# Install dependencies
pip3 install -r requirements.txt -t .

# Run migrations against test DB (or do nothing, migrations are separate)
# python ../migrate.py

# Package up
cp -r . ${DEPLOY_PKG}

Database Change Management Tools

Consider specialized tools for larger teams:

Flyway - Java-based, supports repeatable migrations
Liquibase - XML/YAML/JSON migrations
Prisma Migrate - If using Prisma ORM
Alembic - Python, SQLAlchemy-specific

Example Workflow

Create migration:

touch migrates/004_add_orders_table.sql

Write SQL:

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    total DECIMAL(10,2) NOT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_orders_user_id ON orders(user_id);

Test locally:

createdb test_migration
psql test_migration -f migrates/004_add_orders_table.sql

Commit migration file:

git add migrates/004_add_orders_table.sql
git commit -m "Add orders table"

Apply to staging:

# Update dev-deployment.json if new env vars needed
fission deploy --dev
python src/migrate.py

Apply to production:

# Maintenance window or blue-green deployment
fission deploy
python src/migrate.py

Troubleshooting

Migration Fails

Check error message:

syntax error: Validate SQL with psql -c "SQL" manually
duplicate column: Migration already applied, check schema_migrations
permission denied: DB user lacks ALTER/CREATE privileges
lock timeout: Another migration running, wait or kill process

Migration Already Applied But Failed

If migration was recorded in schema_migrations but failed mid-way:

Manually revert partial changes or fix broken state
Delete row from schema_migrations: DELETE FROM schema_migrations WHERE version = 4;
Re-run migration

Long-Running Migration

Large table alterations can lock rows and cause downtime:

Run during low-traffic period

Use CONCURRENTLY for index creation (PostgreSQL):

CREATE INDEX CONCURRENTLY idx_orders_created ON orders(created_at);

For adding NOT NULL, populate values first with UPDATE, then add constraint
Consider using pg_repack for online table reorganization

Summary

Store migrations in migrates/ directory, numbered sequentially
Use init_db_connection() to run migrations programmatically
Test migrations on staging database before production
Keep migrations backward compatible when possible
Have a rollback plan (backups, down scripts)
Integrate migrations into CI/CD pipeline

15 KiB Raw Blame History