# Database Migrations

This guide covers managing database schema changes in Fission Python projects.

## Table of Contents

1. [Overview](#overview)
2. [Migration Files](#migration-files)
3. [Applying Migrations](#applying-migrations)
4. [Writing Migrations](#writing-migrations)
5. [Best Practices](#best-practices)
6. [Rollback Strategies](#rollback-strategies)
7. [Automation](#automation)

## Overview

Database schema changes should be managed through versioned migration scripts, not manual `CREATE TABLE` statements.

This template uses **plain SQL migration files** (`.sql`), which provide:
- Version control of schema changes
- Repeatable application to different environments
- Clear upgrade/downgrade paths
- Audit trail of schema evolution

## Migration Files

Place SQL migration scripts in the `migrates/` directory:

```
migrates/
├── 001_initial_schema.sql
├── 002_add_user_email.sql
├── 003_create_indexes.sql
└── ...
```

**Naming convention**:
- Prefix with sequential number (zero-padded for sorting)
- Descriptive name after underscore
- `.sql` extension
- Numbers should be unique and monotonically increasing

### Initial Schema Example

```sql
-- migrates/001_create_items_table.sql
-- Create items table
CREATE TABLE IF NOT EXISTS items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    status VARCHAR(50) DEFAULT 'active',
    metadata JSONB,
    created TIMESTAMPTZ DEFAULT NOW(),
    modified TIMESTAMPTZ DEFAULT NOW()
);

-- Add indexes
CREATE INDEX idx_items_status ON items(status);
CREATE INDEX idx_items_created ON items(created);

-- Add comments
COMMENT ON TABLE items IS 'Stores item records';
COMMENT ON COLUMN items.status IS 'Item status: active, inactive, pending';
```

## Applying Migrations

### Manually

```bash
# Connect to database
psql -h localhost -U postgres -d mydb

# Run migration file
\i migrates/001_create_items_table.sql

# Run all migrations in order (bash script)
for file in $(ls migrates/*.sql | sort); do
    echo "Applying $file..."
    psql -h localhost -U postgres -d mydb -f "$file"
done
```

### Automatically from Python

Create a simple migration runner:

```python
# src/migrate.py (not part of function, standalone script)
import os
import psycopg2
from helpers import init_db_connection

def run_migrations():
    conn = init_db_connection()
    cursor = conn.cursor()

    # Create migrations tracking table if not exists
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS schema_migrations (
            version INTEGER PRIMARY KEY,
            name VARCHAR(255) NOT NULL,
            applied_at TIMESTAMPTZ DEFAULT NOW()
        )
    """)

    # Get already-applied migrations
    cursor.execute("SELECT version FROM schema_migrations")
    applied = {row[0] for row in cursor.fetchall()}

    # Find migration files
    migrates_dir = os.path.join(os.path.dirname(__file__), "..", "migrates")
    files = sorted([
        f for f in os.listdir(migrates_dir)
        if f.endswith(".sql")
    ])

    # Apply pending migrations
    for filename in files:
        # Extract version number
        version = int(filename.split("_")[0])
        if version in applied:
            print(f"Skipping {filename} (already applied)")
            continue

        path = os.path.join(migrates_dir, filename)
        print(f"Applying {filename}...")
        with open(path, 'r') as f:
            sql = f.read()

        try:
            cursor.execute(sql)
            cursor.execute(
                "INSERT INTO schema_migrations (version, name) VALUES (%s, %s)",
                (version, filename)
            )
            conn.commit()
            print(f"  ✓ Applied {filename}")
        except Exception as e:
            conn.rollback()
            print(f"  ✗ Failed: {e}")
            raise

    conn.close()
    print("All migrations applied")

if __name__ == "__main__":
    run_migrations()
```

Run:
```bash
python src/migrate.py
```

### Using Migration Tools

For more advanced features (rollbacks, branching), consider:

- **[Alembic](https://alembic.sqlalchemy.org/)** - Database migration tool for SQLAlchemy (if using ORM)
- **[pg migrator](https://github.com/heroku/pg-migrator)** - Heroku's migration tool
- **[goose](https://github.com/pressly/goose)** - Multi-database migration tool (can use from Python)
- **[yoyo-migrations](https://github.com/gugulet-h/yoyo-migrations)** - Python-based migrations

## Writing Migrations

### Principles

1. **Idempotent** - Script should succeed if run multiple times
2. **Additive first** - Add columns/tables before removing/dropping
3. **Backward compatible** - New schema should work with old code
4. **Atomic** - One logical change per migration file
5. **Test locally** - Apply to test database before production

### Common Operations

#### Create Table

```sql
CREATE TABLE IF NOT EXISTS orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    total DECIMAL(10,2) NOT NULL,
    status VARCHAR(50) NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Add foreign key
ALTER TABLE orders
ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE CASCADE;

-- Index for performance
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);
```

#### Add Column

```sql
-- Add nullable column (safe, backward compatible)
ALTER TABLE orders
ADD COLUMN shipping_address JSONB;

-- Add column with default (be careful with large tables!)
-- This rewrites entire table - use cautiously
ALTER TABLE orders
ADD COLUMN tax_amount DECIMAL(10,2) DEFAULT 0.00;
```

#### Rename Column

```sql
-- PostgreSQL 9.2+ supports RENAME COLUMN
ALTER TABLE orders
RENAME COLUMN total TO order_total;
```

#### Modify Column Type

```sql
-- Change VARCHAR length
ALTER TABLE users
ALTER COLUMN email TYPE VARCHAR(320);

-- Convert to different type (use USING clause)
ALTER TABLE orders
ALTER COLUMN status TYPE VARCHAR(100)
USING status::VARCHAR(100);
```

#### Create Index

```sql
-- Simple index
CREATE INDEX idx_users_email ON users(email);

-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Partial index (only active users)
CREATE INDEX idx_users_active ON users(id)
WHERE status = 'active';

-- Multi-column index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
```

#### Drop Column/Table

```sql
-- First, ensure no one is using it
-- Consider using SET DEFAULT then dropping in subsequent migration

-- Drop column
ALTER TABLE orders
DROP COLUMN IF EXISTS old_column;

-- Drop table (dangerous!)
DROP TABLE IF EXISTS old_logs;
```

### Data Migrations

Sometimes you need to transform data:

```sql
-- Backfill new column from existing data
UPDATE orders
SET shipping_address = jsonb_build_object(
    'street', address_street,
    'city', address_city,
    'zip', address_zip
)
WHERE shipping_address IS NULL;

-- Migrate enum values
UPDATE products
SET status = 'active' WHERE status = 'ACTIVE';

-- Clean up duplicates
WITH duplicates AS (
    SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
    FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM duplicates WHERE rn > 1);
```

### Transactional Migrations

Wrap critical migrations in transactions:

```sql
BEGIN;

-- Multiple related operations
ALTER TABLE orders ADD COLUMN shipping_id UUID;
UPDATE orders SET shipping_id = uuid_generate_v4() WHERE shipping_id IS NULL;
ALTER TABLE orders ALTER COLUMN shipping_id SET NOT NULL;

COMMIT;
```

**Note**: DDL statements in PostgreSQL auto-commit, so `BEGIN`/`COMMIT` may not work as expected for schema changes. For complex multi-step changes, consider using advisory locks or deployment coordination.

## Best Practices

### ✅ Do's

1. **Test migrations on copy of production database** before applying to prod
2. **Keep migrations small** - One logical change per file
3. **Write data migrations as separate files** from schema migrations
4. **Use `IF NOT EXISTS` and `IF EXISTS`** to make migrations idempotent
5. **Never drop columns/tables in the same migration you add them** - Separate to allow rollback
6. **Document why** - Add comments explaining the purpose
7. **Consider indexes** - Add indexes for frequently queried columns in same migration as table creation
8. **Use UUIDs** for primary keys (`gen_random_uuid()` in PostgreSQL 13+)
9. **Add `created_at` and `updated_at` timestamps** to all tables
10. **Version numbers must be unique and sequential**

### ❌ Don'ts

1. **Don't modify already-applied migrations** - They're part of history
2. **Don't skip version numbers** - Creates gaps but not critical
3. **Don't use destructive operations without backup** - `DROP COLUMN`, `DROP TABLE`
4. **Don't run long-running migrations during peak hours** - Use low-traffic windows
5. **Don't add NOT NULL without default** on non-empty tables - Will fail due to existing NULL rows
6. **Don't assume order of execution** - Always number sequentially
7. **Don't mix unrelated changes** in one migration file

### Zero-Downtime Migrations

#### Adding Column

```sql
-- Step 1: Add column as nullable or with default (fast)
ALTER TABLE orders ADD COLUMN status VARCHAR(50);

-- Step 2: Deploy code that writes to new column
-- Your application updates to populate status

-- Step 3: Backfill existing rows (if needed)
UPDATE orders SET status = 'completed' WHERE status IS NULL AND shipped_at IS NOT NULL;

-- Step 4: Make column NOT NULL (if needed) - only after all rows have values
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
```

#### Renaming Column

```sql
-- Step 1: Add new column
ALTER TABLE orders ADD COLUMN order_status VARCHAR(50);

-- Step 2: Deploy code writing to both old and new columns (dual-write)

-- Step 3: Backfill data
UPDATE orders SET order_status = status;

-- Step 4: Deploy code reading from new column, stop writing to old

-- Step 5: Drop old column (in separate migration)
ALTER TABLE orders DROP COLUMN status;
```

## Rollback Strategies

### Manual Rollback

For each migration, you may want to write a corresponding "down" migration:

```sql
-- 002_add_user_email.sql (UP)
ALTER TABLE users ADD COLUMN email VARCHAR(320);

-- 002_add_user_email_rollback.sql (DOWN)
ALTER TABLE users DROP COLUMN IF EXISTS email;
```

Store rollback scripts alongside migrations or in separate `rollbacks/` directory.

### Point-in-Time Recovery

**Best strategy**: Restore database from backup to point before bad migration, then re-apply good migrations.

```bash
# Restore from PITR backup (if using WAL archiving)
pg_restore -h localhost -U postgres -d mydb --point-in-time="2025-03-18 10:30:00"

# Re-run migrations up to good version
python src/migrate.py  # But this applies all, so need selective
```

### Selective Rollback Script

```python
# rollback.py
import sys
from helpers import init_db_connection

def rollback(to_version: int):
    conn = init_db_connection()
    cursor = conn.cursor()

    # Find migrations after target version
    cursor.execute("""
        SELECT version, name
        FROM schema_migrations
        WHERE version > %s
        ORDER BY version DESC
    """, (to_version,))

    migrations = cursor.fetchall()

    for version, name in migrations:
        rollback_file = f"rollbacks/{version:03d}_{name.split('_', 1)[1]}.sql"
        print(f"Rolling back {name} using {rollback_file}...")
        with open(rollback_file, 'r') as f:
            sql = f.read()
        cursor.execute(sql)
        cursor.execute("DELETE FROM schema_migrations WHERE version = %s", (version,))
        conn.commit()
        print(f"  Rolled back {name}")

    conn.close()
    print(f"Rolled back to version {to_version}")

if __name__ == "__main__":
    target = int(sys.argv[1])
    rollback(target)
```

## Automation

### CI/CD Integration

In your deployment pipeline:

```bash
# Before deploying new code
python src/migrate.py

# If migrations fail, abort deployment
if [ $? -ne 0 ]; then
    echo "Migrations failed, aborting deployment"
    exit 1
fi

# Deploy new code
fission deploy
```

### Pre-deployment Hooks

Use Fission hooks to run migrations automatically:

```json
{
  "hooks": {
    "function_pre_deploy": [
      {
        "type": "http",
        "url": "http://migration-service/migrate",
        "timeout": 300000
      }
    ]
  }
}
```

Or simpler: run migration as part of `build.sh`:

```bash
#!/bin/sh
# src/build.sh

# Install dependencies
pip3 install -r requirements.txt -t .

# Run migrations against test DB (or do nothing, migrations are separate)
# python ../migrate.py

# Package up
cp -r . ${DEPLOY_PKG}
```

### Database Change Management Tools

Consider specialized tools for larger teams:
- **[Flyway](https://flywaydb.org/)** - Java-based, supports repeatable migrations
- **[Liquibase](https://www.liquibase.org/)** - XML/YAML/JSON migrations
- **[Prisma Migrate](https://www.prisma.io/docs/concepts/components/prisma-migrate)** - If using Prisma ORM
- **[Alembic](https://alembic.sqlalchemy.org/)** - Python, SQLAlchemy-specific

## Example Workflow

1. **Create migration**:
   ```bash
   touch migrates/004_add_orders_table.sql
   ```

2. **Write SQL**:
   ```sql
   CREATE TABLE orders (
       id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
       user_id UUID NOT NULL REFERENCES users(id),
       total DECIMAL(10,2) NOT NULL,
       status VARCHAR(50) DEFAULT 'pending',
       created_at TIMESTAMPTZ DEFAULT NOW()
   );

   CREATE INDEX idx_orders_user_id ON orders(user_id);
   ```

3. **Test locally**:
   ```bash
   createdb test_migration
   psql test_migration -f migrates/004_add_orders_table.sql
   ```

4. **Commit migration file**:
   ```bash
   git add migrates/004_add_orders_table.sql
   git commit -m "Add orders table"
   ```

5. **Apply to staging**:
   ```bash
   # Update dev-deployment.json if new env vars needed
   fission deploy --dev
   python src/migrate.py
   ```

6. **Apply to production**:
   ```bash
   # Maintenance window or blue-green deployment
   fission deploy
   python src/migrate.py
   ```

## Troubleshooting

### Migration Fails

Check error message:
- **syntax error**: Validate SQL with `psql -c "SQL"` manually
- **duplicate column**: Migration already applied, check `schema_migrations`
- **permission denied**: DB user lacks ALTER/CREATE privileges
- **lock timeout**: Another migration running, wait or kill process

### Migration Already Applied But Failed

If migration was recorded in `schema_migrations` but failed mid-way:

1. Manually revert partial changes or fix broken state
2. Delete row from `schema_migrations`: `DELETE FROM schema_migrations WHERE version = 4;`
3. Re-run migration

### Long-Running Migration

Large table alterations can lock rows and cause downtime:

- Run during low-traffic period
- Use `CONCURRENTLY` for index creation (PostgreSQL):
  ```sql
  CREATE INDEX CONCURRENTLY idx_orders_created ON orders(created_at);
  ```
- For adding NOT NULL, populate values first with UPDATE, then add constraint
- Consider using `pg_repack` for online table reorganization

## Summary

- Store migrations in `migrates/` directory, numbered sequentially
- Use `init_db_connection()` to run migrations programmatically
- Test migrations on staging database before production
- Keep migrations backward compatible when possible
- Have a rollback plan (backups, down scripts)
- Integrate migrations into CI/CD pipeline