# Plan: Enhance Fission Python Projects with Exceptions, Pydantic Models, and Code Quality Improvements ## Context Three Fission Python projects need systematic improvements to enhance error handling, data validation, and code maintainability: - **py-eom-storage**: Storage management API (GET/POST /storages, GET/PUT/DELETE /storages/{id}) - **py-eom-quota**: Quota management API (GET/POST /quotas, POST/DELETE /users/{userId}/quotas/{quotaId}) - **py-ailbl-scheduler**: Background worker system for scheduled tasks Currently, all projects use generic `Exception` with simple error messages returned as `{"error": str(err)}` with 500 status. There's no structured error handling, request validation, or consistent response formatting. Some projects have Pydantic models but not comprehensively used. ## Goals 1. **Custom Exceptions**: Implement domain-specific exception classes with: - `error_code`: Machine-readable error identifier - `http_status_code`: Appropriate HTTP status (400, 404, 409, 500, etc.) - `error_msg`: Human-readable message - `x_user`: User identifier from request header (X-Fission-Params-UserId or similar) 2. **Pydantic Models**: Add comprehensive request/response models for all endpoints: - Request body validation (POST/PUT) - Query parameter validation (GET) - Structured response schemas - Consistent error response format 3. **Code Quality**: Improve maintainability with: - Detailed docstrings for all functions and classes - Refactoring of complex, multi-responsibility functions - Consistent error handling patterns - Fix broken imports and type issues ## Project-Specific Plans ### 1. py-eom-storage **Current State:** - Has Pydantic models: `S3Resource`, `S3Credential` (unused) - Uses dataclasses: `Page`, `Filter` (should be Pydantic) - Endpoints: `/eom/admin/storages` (filter_or_insert.py), `/eom/admin/storages/{StorageId}` (update_or_delete.py) **Changes Needed:** **A. Create `src/exceptions.py`:** ```python class StorageException(Exception): """Base exception for storage-related errors.""" def __init__(self, error_code: str, http_status: int, error_msg: str, x_user: str = None): self.error_code = error_code self.http_status = http_status self.error_msg = error_msg self.x_user = x_user super().__init__(self.error_msg) class ValidationError(StorageException): """Invalid input data.""" class NotFoundError(StorageException): """Resource not found.""" class ConflictError(StorageException): """Resource conflict (e.g., duplicate name).""" class DatabaseError(StorageException): """Database operation failed.""" class S3ConnectionError(StorageException): """S3/MinIO connection failed.""" ``` **B. Create/Update `src/models.py` (or extend existing):** ```python # Request models class StorageCreateRequest(BaseModel): name: str = Field(..., min_length=1, max_length=255) description: typing.Optional[str] = None resource: dict # Should validate S3 structure class StorageUpdateRequest(BaseModel): name: typing.Optional[str] = None description: typing.Optional[str] = None resource: typing.Optional[dict] = None active: typing.Optional[bool] = None # Query models (convert Page/Filter to Pydantic) class StorageFilter(BaseModel): ids: typing.Optional[typing.List[str]] = None keyword: typing.Optional[str] = None collection_id: typing.Optional[str] = None enable: typing.Optional[bool] = None created_from: typing.Optional[datetime] = None created_to: typing.Optional[datetime] = None # ... other filters class StorageQuery(BaseModel): page: int = 0 size: int = Field(8, ge=1, le=100) asc: bool = True sortby: typing.Optional[Literal["name", "enable", "created", "modified"]] = None filter: StorageFilter = StorageFilter() # Response models class StorageResponse(BaseModel): id: str name: str description: typing.Optional[str] resource: dict enable: bool created: datetime modified: datetime class ErrorResponse(BaseModel): error_code: str http_status: int error_msg: str x_user: typing.Optional[str] = None details: typing.Optional[dict] = None ``` **C. Refactor `filter_or_insert.py`:** - Replace try-except to catch custom exceptions - Validate request body using Pydantic in `make_insert_request` - Use Pydantic for query parsing in `make_filter_request` - Add helper function `handle_exception` to format error responses consistently - Extract SQL queries into separate functions for testability - Add comprehensive docstrings explaining each endpoint's behavior **D. Refactor `update_or_delete.py`:** - Similar pattern: custom exceptions, Pydantic validation - Refactor `is_depended_on_storage` - this function does too much, split into smaller helpers - Add detailed comments for each database operation - Ensure proper error messages with appropriate HTTP status codes **E. Update `helpers.py`:** - Add utility `get_user_from_header(request)` to extract x-user from various headers --- ### 2. py-eom-quota **Current State:** - Already has extensive Pydantic models in `models.py` (QuotaPage, UserQuotaPage, ScheduleCreate, etc.) - But: `userquota_filter.py` imports from `quota_update_or_delete` which doesn't exist (broken import) - Need to expand models to cover all request/response scenarios - Endpoints: `/eom/admin/quotas` (filter), `/eom/admin/users/{UserId}/quotas` (filter/insert), `/eom/admin/users/{UserId}/quotas/{QuotaId}` (update/delete) **Changes Needed:** **A. Create `src/exceptions.py`:** ```python class QuotaException(Exception): """Base exception for quota management.""" def __init__(self, error_code: str, http_status: int, error_msg: str, x_user: str = None): self.error_code = error_code self.http_status = http_status self.error_msg = error_msg self.x_user = x_user super().__init__(self.error_msg) class QuotaNotFoundError(QuotaException): """Quota does not exist.""" class UserQuotaConflictError(QuotaException): """User already has this type of quota.""" class ValidationError(QuotaException): """Invalid request data.""" class DatabaseError(QuotaException): """Database operation failed.""" ``` **B. Extend `src/models.py`:** The existing models mix schedule and quota models. Need to: - Separate or clearly document which are for quotas vs schedules - Add request models: ```python class QuotaCreateRequest(BaseModel): name: str description: typing.Optional[str] = None type: QuotaType value: typing.Union[MaxSizeBody, MaxOrderTimesBody] expire: ExpireBody class QuotaUpdateRequest(BaseModel): name: typing.Optional[str] = None description: typing.Optional[str] = None enable: typing.Optional[bool] = None type: typing.Optional[QuotaType] = None value: typing.Optional[typing.Union[MaxSizeBody, MaxOrderTimesBody]] = None expire: typing.Optional[ExpireBody] = None class UserQuotaAssignRequest(BaseModel): quota_id: str ``` - Ensure response models exist (QuotaResponse, UserQuotaResponse) **C. Fix `userquota_filter.py`:** - Fix broken import: `from quota_update_or_delete import __get_by_id` → `from userquota_insert_or_delete import __get_by_id` (or better: move `__get_by_id` to a shared helpers module) - Refactor `make_filter_request`: - Use `UserQuotaPage` Pydantic model properly - Validate user_id header is present using Pydantic - Replace try-except with custom exceptions - Add comprehensive docstring - The function currently manually sets `paging.filter.user_ids = [user_id]` - this should be part of a validation layer **D. Refactor `userquota_insert_or_delete.py`:** - Fix the same broken import pattern (it imports nothing but uses `__get_by_id` in filter) - Add proper request validation using Pydantic models - Replace generic exceptions with `UserQuotaConflictError`, `QuotaNotFoundError`, etc. - Refactor `__validate_user_quota_type` - currently SQL query is hardcoded, add comments explaining business logic - The insert SQL has wrong columns: `INSERT INTO eom_user_quota(id, name, description, type, value, expire)` but the table likely only has (id, user_id, quota_id). Need to check database schema but from the code it seems mismatched. **E. Improve `helpers.py`:** - Add utility functions for extracting and validating user headers - Add consistent error handling helpers --- ### 3. py-ailbl-scheduler **Current State:** - No HTTP endpoints (only time-triggered workers) - No Pydantic models needed per user's choice - Needs custom exceptions and code quality improvements - Workers: `worker_session_picker.py`, `worker_session_poller.py`, `worker_scheduler_scan.py`, `worker_schedule_auto_disable.py` - Common utilities in `common.py`, `helpers.py` **Changes Needed:** **A. Create `src/exceptions.py`:** ```python class SchedulerException(Exception): """Base exception for scheduler operations.""" def __init__(self, error_code: str, error_msg: str, details: dict = None): self.error_code = error_code self.error_msg = error_msg self.details = details super().__init__(self.error_msg) class ScheduleNotFoundError(SchedulerException): """Schedule does not exist.""" class SessionLockError(SchedulerError): """Failed to acquire session lock.""" class DagsterError(SchedulerError): """Dagster pipeline execution failed.""" class CronParseError(SchedulerError): """Invalid cron expression.""" class ConfigurationError(SchedulerError): """Missing or invalid configuration.""" ``` **B. Refactor `worker_scheduler_scan.py`:** This is the most complex function (446 lines). Goals: - Extract helper functions: - `_normalize_cron_for_cronner` (already exists) - `_as_date`, `_as_time` (already exist) - `_within_active_window` (already exists) - `_is_due_by_cron` (already exists) - `_is_due_by_freq` (already exists) - Extract the schedule creation logic into `_create_session_for_schedule(cur, schedule, now, slot_start)` - Extract the candidate schedule selection into `_fetch_due_schedules(cur, now, slot_start, slot_end, limit=50)` - Add detailed docstrings explaining the overall algorithm: "Scan for schedules that are due in the current time slot and create sessions atomically" - Improve variable names (e.g., `s` → `schedule`, `cur` → `cursor`) - Add comments explaining the advisory lock strategy and why it's needed - Ensure proper exception handling with custom exceptions - The function currently catches generic Exception at the end - wrap specific operations with appropriate custom exceptions **C. Refactor `worker_session_picker.py`:** - Similar breakdown: extract `_pick_and_claim_sessions(conn, limit=20)` helper - Extract `_process_kind5_session(session, ctx)` and `_process_kind1_session(session, ctx)` into separate functions - Add detailed docstring explaining the picking strategy (FOR UPDATE SKIP LOCKED) - Replace bare `except Exception` with specific exception types - Add comments explaining the kind handling logic (kind 5 vs kind 1) - The function `_build_run_config_kind5` is specific to that kind - could be moved to a separate module if needed **D. Refactor `worker_session_poller.py`:** - Extract `_update_completed_session(cur, session_id, status_info, now)` helper - Extract `_update_started_session(cur, session_id, started_dt)` helper - Add docstring explaining polling strategy - Replace generic exception handling with `DagsterError` when Dagster calls fail - Add type hints for the row unpacking: `for sid, run_id, started, cron_description, created_by in rows:` **E. Refactor `worker_schedule_auto_disable.py`:** - This is simple enough already but still add comprehensive docstring - Consider adding custom exception for database errors **F. Improve `helpers.py` (in scheduler):** - The `GraphQL` class and related functions are specific to Dagster - add docstrings - `safe_notify` is good, add docstring - Consider creating a `SchedulerHelper` class to group related utilities **G. Improve `common.py`:** - Already has good docstrings but could be expanded - Add type hints to function signatures - Break `launch_pipeline_execution` if too complex (handles multiple error cases) --- ## Common Patterns ### Exception Hierarchy Each project will have: ```python class BaseProjectException(Exception): """Base with error_code, http_status (if applicable), message, metadata.""" pass # Specific exceptions inherit from base class NotFoundError(BaseProjectException): ... class ValidationError(BaseProjectException): ... class ConflictError(BaseProjectException): ... class DatabaseError(BaseProjectException): ... # Domain-specific: StorageNotFoundError, QuotaConflictError, ScheduleNotFoundError, etc. ``` ### Error Response Format Standardized JSON response: ```json { "error_code": "STORAGE_NOT_FOUND", "http_status": 404, "error_msg": "Storage with id 'xyz' does not exist", "x_user": "user123", "details": { /* optional additional context */ } } ``` ### Middleware Pattern In each HTTP endpoint function: ```python def main(): try: # Extract user header x_user = request.headers.get("X-Fission-Params-UserId") # Route to handler return handler() except ValidationError as e: return error_response(e), 400 except NotFoundError as e: return error_response(e), 404 except ConflictError as e: return error_response(e), 409 except StorageException as e: logger.error(f"Storage error: {e.error_code}: {e.error_msg}") return error_response(e), 500 except Exception as e: logger.exception("Unexpected error") return {"error": "Internal server error"}, 500 ``` --- ## Implementation Order 1. **Phase 1**: Create exception modules for all three projects 2. **Phase 2**: Add/expand Pydantic models (storage, then complete quota) 3. **Phase 3**: Refactor endpoints to use exceptions and models 4. **Phase 4**: Refactor complex functions in scheduler 5. **Phase 5**: Documentation pass - ensure all functions have docstrings 6. **Phase 6**: Test manually by running functions (no automated tests to update) --- ## Verification Steps 1. **Manual Testing**: - Deploy each function to local Fission or use test environment - Test error cases: invalid input, missing resources, database failures - Verify error response format matches specification - Check logs for proper error logging 2. **Code Review**: - All functions have docstrings with Args, Returns, Raises sections - No function exceeds ~50 lines (extracted helpers where needed) - All exceptions are specific, not generic `Exception` - Request validation happens before business logic 3. **Import Verification**: - Fix broken imports (especially in py-eom-quota's userquota_filter.py) - Ensure circular dependencies are avoided 4. **Type Safety**: - Run static type checker if available (mypy/pyright) - Ensure all functions have return type hints --- ## Critical Files to Modify **py-eom-storage:** - `src/exceptions.py` (new) - `src/models.py` (create/extend) - `src/filter_or_insert.py` (refactor) - `src/update_or_delete.py` (refactor) - `src/helpers.py` (add utilities) - `src/vault.py` (minor: improve docs) **py-eom-quota:** - `src/exceptions.py` (new) - `src/models.py` (extend with request models) - `src/userquota_filter.py` (fix imports, refactor) - `src/userquota_insert_or_delete.py` (refactor, fix SQL if needed) - `src/helpers.py` (add utilities) **py-ailbl-scheduler:** - `src/exceptions.py` (new) - `src/worker_scheduler_scan.py` (major refactor) - `src/worker_session_picker.py` (refactor) - `src/worker_session_poller.py` (refactor) - `src/worker_schedule_auto_disable.py` (docs) - `src/common.py` (docs, type hints) - `src/helpers.py` (docs, maybe extract class) --- ## Notes - All changes are in `/workspaces/claude-marketplace/data/examples/` - Preserve existing API contracts (URLs, HTTP methods) - Do not change database schema - Maintain backward compatibility with existing clients - Focus on internal improvements: error handling, validation, documentation - Use consistent patterns across all three projects