# Database Migrations This guide covers managing database schema changes in Fission Python projects. ## Table of Contents 1. [Overview](#overview) 2. [Migration Files](#migration-files) 3. [Applying Migrations](#applying-migrations) 4. [Writing Migrations](#writing-migrations) 5. [Best Practices](#best-practices) 6. [Rollback Strategies](#rollback-strategies) 7. [Automation](#automation) ## Overview Database schema changes should be managed through versioned migration scripts, not manual `CREATE TABLE` statements. This template uses **plain SQL migration files** (`.sql`), which provide: - Version control of schema changes - Repeatable application to different environments - Clear upgrade/downgrade paths - Audit trail of schema evolution ## Migration Files Place SQL migration scripts in the `migrates/` directory: ``` migrates/ ├── 001_initial_schema.sql ├── 002_add_user_email.sql ├── 003_create_indexes.sql └── ... ``` **Naming convention**: - Prefix with sequential number (zero-padded for sorting) - Descriptive name after underscore - `.sql` extension - Numbers should be unique and monotonically increasing ### Initial Schema Example ```sql -- migrates/001_create_items_table.sql -- Create items table CREATE TABLE IF NOT EXISTS items ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, description TEXT, status VARCHAR(50) DEFAULT 'active', metadata JSONB, created TIMESTAMPTZ DEFAULT NOW(), modified TIMESTAMPTZ DEFAULT NOW() ); -- Add indexes CREATE INDEX idx_items_status ON items(status); CREATE INDEX idx_items_created ON items(created); -- Add comments COMMENT ON TABLE items IS 'Stores item records'; COMMENT ON COLUMN items.status IS 'Item status: active, inactive, pending'; ``` ## Applying Migrations ### Manually ```bash # Connect to database psql -h localhost -U postgres -d mydb # Run migration file \i migrates/001_create_items_table.sql # Run all migrations in order (bash script) for file in $(ls migrates/*.sql | sort); do echo "Applying $file..." psql -h localhost -U postgres -d mydb -f "$file" done ``` ### Automatically from Python Create a simple migration runner: ```python # src/migrate.py (not part of function, standalone script) import os import psycopg2 from helpers import init_db_connection def run_migrations(): conn = init_db_connection() cursor = conn.cursor() # Create migrations tracking table if not exists cursor.execute(""" CREATE TABLE IF NOT EXISTS schema_migrations ( version INTEGER PRIMARY KEY, name VARCHAR(255) NOT NULL, applied_at TIMESTAMPTZ DEFAULT NOW() ) """) # Get already-applied migrations cursor.execute("SELECT version FROM schema_migrations") applied = {row[0] for row in cursor.fetchall()} # Find migration files migrates_dir = os.path.join(os.path.dirname(__file__), "..", "migrates") files = sorted([ f for f in os.listdir(migrates_dir) if f.endswith(".sql") ]) # Apply pending migrations for filename in files: # Extract version number version = int(filename.split("_")[0]) if version in applied: print(f"Skipping {filename} (already applied)") continue path = os.path.join(migrates_dir, filename) print(f"Applying {filename}...") with open(path, 'r') as f: sql = f.read() try: cursor.execute(sql) cursor.execute( "INSERT INTO schema_migrations (version, name) VALUES (%s, %s)", (version, filename) ) conn.commit() print(f" ✓ Applied {filename}") except Exception as e: conn.rollback() print(f" ✗ Failed: {e}") raise conn.close() print("All migrations applied") if __name__ == "__main__": run_migrations() ``` Run: ```bash python src/migrate.py ``` ### Using Migration Tools For more advanced features (rollbacks, branching), consider: - **[Alembic](https://alembic.sqlalchemy.org/)** - Database migration tool for SQLAlchemy (if using ORM) - **[pg migrator](https://github.com/heroku/pg-migrator)** - Heroku's migration tool - **[goose](https://github.com/pressly/goose)** - Multi-database migration tool (can use from Python) - **[yoyo-migrations](https://github.com/gugulet-h/yoyo-migrations)** - Python-based migrations ## Writing Migrations ### Principles 1. **Idempotent** - Script should succeed if run multiple times 2. **Additive first** - Add columns/tables before removing/dropping 3. **Backward compatible** - New schema should work with old code 4. **Atomic** - One logical change per migration file 5. **Test locally** - Apply to test database before production ### Common Operations #### Create Table ```sql CREATE TABLE IF NOT EXISTS orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, total DECIMAL(10,2) NOT NULL, status VARCHAR(50) NOT NULL DEFAULT 'pending', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); -- Add foreign key ALTER TABLE orders ADD CONSTRAINT fk_orders_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE; -- Index for performance CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_orders_created_at ON orders(created_at); ``` #### Add Column ```sql -- Add nullable column (safe, backward compatible) ALTER TABLE orders ADD COLUMN shipping_address JSONB; -- Add column with default (be careful with large tables!) -- This rewrites entire table - use cautiously ALTER TABLE orders ADD COLUMN tax_amount DECIMAL(10,2) DEFAULT 0.00; ``` #### Rename Column ```sql -- PostgreSQL 9.2+ supports RENAME COLUMN ALTER TABLE orders RENAME COLUMN total TO order_total; ``` #### Modify Column Type ```sql -- Change VARCHAR length ALTER TABLE users ALTER COLUMN email TYPE VARCHAR(320); -- Convert to different type (use USING clause) ALTER TABLE orders ALTER COLUMN status TYPE VARCHAR(100) USING status::VARCHAR(100); ``` #### Create Index ```sql -- Simple index CREATE INDEX idx_users_email ON users(email); -- Unique index CREATE UNIQUE INDEX idx_users_email_unique ON users(email); -- Partial index (only active users) CREATE INDEX idx_users_active ON users(id) WHERE status = 'active'; -- Multi-column index CREATE INDEX idx_orders_user_status ON orders(user_id, status); ``` #### Drop Column/Table ```sql -- First, ensure no one is using it -- Consider using SET DEFAULT then dropping in subsequent migration -- Drop column ALTER TABLE orders DROP COLUMN IF EXISTS old_column; -- Drop table (dangerous!) DROP TABLE IF EXISTS old_logs; ``` ### Data Migrations Sometimes you need to transform data: ```sql -- Backfill new column from existing data UPDATE orders SET shipping_address = jsonb_build_object( 'street', address_street, 'city', address_city, 'zip', address_zip ) WHERE shipping_address IS NULL; -- Migrate enum values UPDATE products SET status = 'active' WHERE status = 'ACTIVE'; -- Clean up duplicates WITH duplicates AS ( SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn FROM users ) DELETE FROM users WHERE id IN (SELECT id FROM duplicates WHERE rn > 1); ``` ### Transactional Migrations Wrap critical migrations in transactions: ```sql BEGIN; -- Multiple related operations ALTER TABLE orders ADD COLUMN shipping_id UUID; UPDATE orders SET shipping_id = uuid_generate_v4() WHERE shipping_id IS NULL; ALTER TABLE orders ALTER COLUMN shipping_id SET NOT NULL; COMMIT; ``` **Note**: DDL statements in PostgreSQL auto-commit, so `BEGIN`/`COMMIT` may not work as expected for schema changes. For complex multi-step changes, consider using advisory locks or deployment coordination. ## Best Practices ### ✅ Do's 1. **Test migrations on copy of production database** before applying to prod 2. **Keep migrations small** - One logical change per file 3. **Write data migrations as separate files** from schema migrations 4. **Use `IF NOT EXISTS` and `IF EXISTS`** to make migrations idempotent 5. **Never drop columns/tables in the same migration you add them** - Separate to allow rollback 6. **Document why** - Add comments explaining the purpose 7. **Consider indexes** - Add indexes for frequently queried columns in same migration as table creation 8. **Use UUIDs** for primary keys (`gen_random_uuid()` in PostgreSQL 13+) 9. **Add `created_at` and `updated_at` timestamps** to all tables 10. **Version numbers must be unique and sequential** ### ❌ Don'ts 1. **Don't modify already-applied migrations** - They're part of history 2. **Don't skip version numbers** - Creates gaps but not critical 3. **Don't use destructive operations without backup** - `DROP COLUMN`, `DROP TABLE` 4. **Don't run long-running migrations during peak hours** - Use low-traffic windows 5. **Don't add NOT NULL without default** on non-empty tables - Will fail due to existing NULL rows 6. **Don't assume order of execution** - Always number sequentially 7. **Don't mix unrelated changes** in one migration file ### Zero-Downtime Migrations #### Adding Column ```sql -- Step 1: Add column as nullable or with default (fast) ALTER TABLE orders ADD COLUMN status VARCHAR(50); -- Step 2: Deploy code that writes to new column -- Your application updates to populate status -- Step 3: Backfill existing rows (if needed) UPDATE orders SET status = 'completed' WHERE status IS NULL AND shipped_at IS NOT NULL; -- Step 4: Make column NOT NULL (if needed) - only after all rows have values ALTER TABLE orders ALTER COLUMN status SET NOT NULL; ``` #### Renaming Column ```sql -- Step 1: Add new column ALTER TABLE orders ADD COLUMN order_status VARCHAR(50); -- Step 2: Deploy code writing to both old and new columns (dual-write) -- Step 3: Backfill data UPDATE orders SET order_status = status; -- Step 4: Deploy code reading from new column, stop writing to old -- Step 5: Drop old column (in separate migration) ALTER TABLE orders DROP COLUMN status; ``` ## Rollback Strategies ### Manual Rollback For each migration, you may want to write a corresponding "down" migration: ```sql -- 002_add_user_email.sql (UP) ALTER TABLE users ADD COLUMN email VARCHAR(320); -- 002_add_user_email_rollback.sql (DOWN) ALTER TABLE users DROP COLUMN IF EXISTS email; ``` Store rollback scripts alongside migrations or in separate `rollbacks/` directory. ### Point-in-Time Recovery **Best strategy**: Restore database from backup to point before bad migration, then re-apply good migrations. ```bash # Restore from PITR backup (if using WAL archiving) pg_restore -h localhost -U postgres -d mydb --point-in-time="2025-03-18 10:30:00" # Re-run migrations up to good version python src/migrate.py # But this applies all, so need selective ``` ### Selective Rollback Script ```python # rollback.py import sys from helpers import init_db_connection def rollback(to_version: int): conn = init_db_connection() cursor = conn.cursor() # Find migrations after target version cursor.execute(""" SELECT version, name FROM schema_migrations WHERE version > %s ORDER BY version DESC """, (to_version,)) migrations = cursor.fetchall() for version, name in migrations: rollback_file = f"rollbacks/{version:03d}_{name.split('_', 1)[1]}.sql" print(f"Rolling back {name} using {rollback_file}...") with open(rollback_file, 'r') as f: sql = f.read() cursor.execute(sql) cursor.execute("DELETE FROM schema_migrations WHERE version = %s", (version,)) conn.commit() print(f" Rolled back {name}") conn.close() print(f"Rolled back to version {to_version}") if __name__ == "__main__": target = int(sys.argv[1]) rollback(target) ``` ## Automation ### CI/CD Integration In your deployment pipeline: ```bash # Before deploying new code python src/migrate.py # If migrations fail, abort deployment if [ $? -ne 0 ]; then echo "Migrations failed, aborting deployment" exit 1 fi # Deploy new code fission deploy ``` ### Pre-deployment Hooks Use Fission hooks to run migrations automatically: ```json { "hooks": { "function_pre_deploy": [ { "type": "http", "url": "http://migration-service/migrate", "timeout": 300000 } ] } } ``` Or simpler: run migration as part of `build.sh`: ```bash #!/bin/sh # src/build.sh # Install dependencies pip3 install -r requirements.txt -t . # Run migrations against test DB (or do nothing, migrations are separate) # python ../migrate.py # Package up cp -r . ${DEPLOY_PKG} ``` ### Database Change Management Tools Consider specialized tools for larger teams: - **[Flyway](https://flywaydb.org/)** - Java-based, supports repeatable migrations - **[Liquibase](https://www.liquibase.org/)** - XML/YAML/JSON migrations - **[Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate)** - If using Prisma ORM - **[Alembic](https://alembic.sqlalchemy.org/)** - Python, SQLAlchemy-specific ## Example Workflow 1. **Create migration**: ```bash touch migrates/004_add_orders_table.sql ``` 2. **Write SQL**: ```sql CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id), total DECIMAL(10,2) NOT NULL, status VARCHAR(50) DEFAULT 'pending', created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_orders_user_id ON orders(user_id); ``` 3. **Test locally**: ```bash createdb test_migration psql test_migration -f migrates/004_add_orders_table.sql ``` 4. **Commit migration file**: ```bash git add migrates/004_add_orders_table.sql git commit -m "Add orders table" ``` 5. **Apply to staging**: ```bash # Update dev-deployment.json if new env vars needed fission deploy --dev python src/migrate.py ``` 6. **Apply to production**: ```bash # Maintenance window or blue-green deployment fission deploy python src/migrate.py ``` ## Troubleshooting ### Migration Fails Check error message: - **syntax error**: Validate SQL with `psql -c "SQL"` manually - **duplicate column**: Migration already applied, check `schema_migrations` - **permission denied**: DB user lacks ALTER/CREATE privileges - **lock timeout**: Another migration running, wait or kill process ### Migration Already Applied But Failed If migration was recorded in `schema_migrations` but failed mid-way: 1. Manually revert partial changes or fix broken state 2. Delete row from `schema_migrations`: `DELETE FROM schema_migrations WHERE version = 4;` 3. Re-run migration ### Long-Running Migration Large table alterations can lock rows and cause downtime: - Run during low-traffic period - Use `CONCURRENTLY` for index creation (PostgreSQL): ```sql CREATE INDEX CONCURRENTLY idx_orders_created ON orders(created_at); ``` - For adding NOT NULL, populate values first with UPDATE, then add constraint - Consider using `pg_repack` for online table reorganization ## Summary - Store migrations in `migrates/` directory, numbered sequentially - Use `init_db_connection()` to run migrations programmatically - Test migrations on staging database before production - Keep migrations backward compatible when possible - Have a rollback plan (backups, down scripts) - Integrate migrations into CI/CD pipeline