ref: up

2026-03-18 20:21:56 +07:00
commit 29667cd92f
58 changed files with 8459 additions and 0 deletions
--- a/.claude/plans/rosy-giggling-flurry.md
+++ b/.claude/plans/rosy-giggling-flurry.md
@@ -0,0 +1,421 @@
+# Plan: Enhance Fission Python Projects with Exceptions, Pydantic Models, and Code Quality Improvements
+
+## Context
+
+Three Fission Python projects need systematic improvements to enhance error handling, data validation, and code maintainability:
+
+- **py-eom-storage**: Storage management API (GET/POST /storages, GET/PUT/DELETE /storages/{id})
+- **py-eom-quota**: Quota management API (GET/POST /quotas, POST/DELETE /users/{userId}/quotas/{quotaId})
+- **py-ailbl-scheduler**: Background worker system for scheduled tasks
+
+Currently, all projects use generic `Exception` with simple error messages returned as `{"error": str(err)}` with 500 status. There's no structured error handling, request validation, or consistent response formatting. Some projects have Pydantic models but not comprehensively used.
+
+## Goals
+
+1. **Custom Exceptions**: Implement domain-specific exception classes with:
+   - `error_code`: Machine-readable error identifier
+   - `http_status_code`: Appropriate HTTP status (400, 404, 409, 500, etc.)
+   - `error_msg`: Human-readable message
+   - `x_user`: User identifier from request header (X-Fission-Params-UserId or similar)
+
+2. **Pydantic Models**: Add comprehensive request/response models for all endpoints:
+   - Request body validation (POST/PUT)
+   - Query parameter validation (GET)
+   - Structured response schemas
+   - Consistent error response format
+
+3. **Code Quality**: Improve maintainability with:
+   - Detailed docstrings for all functions and classes
+   - Refactoring of complex, multi-responsibility functions
+   - Consistent error handling patterns
+   - Fix broken imports and type issues
+
+## Project-Specific Plans
+
+### 1. py-eom-storage
+
+**Current State:**
+- Has Pydantic models: `S3Resource`, `S3Credential` (unused)
+- Uses dataclasses: `Page`, `Filter` (should be Pydantic)
+- Endpoints: `/eom/admin/storages` (filter_or_insert.py), `/eom/admin/storages/{StorageId}` (update_or_delete.py)
+
+**Changes Needed:**
+
+**A. Create `src/exceptions.py`:**
+```python
+class StorageException(Exception):
+    """Base exception for storage-related errors."""
+    def __init__(self, error_code: str, http_status: int, error_msg: str, x_user: str = None):
+        self.error_code = error_code
+        self.http_status = http_status
+        self.error_msg = error_msg
+        self.x_user = x_user
+        super().__init__(self.error_msg)
+
+class ValidationError(StorageException):
+    """Invalid input data."""
+class NotFoundError(StorageException):
+    """Resource not found."""
+class ConflictError(StorageException):
+    """Resource conflict (e.g., duplicate name)."""
+class DatabaseError(StorageException):
+    """Database operation failed."""
+class S3ConnectionError(StorageException):
+    """S3/MinIO connection failed."""
+```
+
+**B. Create/Update `src/models.py` (or extend existing):**
+```python
+# Request models
+class StorageCreateRequest(BaseModel):
+    name: str = Field(..., min_length=1, max_length=255)
+    description: typing.Optional[str] = None
+    resource: dict  # Should validate S3 structure
+
+class StorageUpdateRequest(BaseModel):
+    name: typing.Optional[str] = None
+    description: typing.Optional[str] = None
+    resource: typing.Optional[dict] = None
+    active: typing.Optional[bool] = None
+
+# Query models (convert Page/Filter to Pydantic)
+class StorageFilter(BaseModel):
+    ids: typing.Optional[typing.List[str]] = None
+    keyword: typing.Optional[str] = None
+    collection_id: typing.Optional[str] = None
+    enable: typing.Optional[bool] = None
+    created_from: typing.Optional[datetime] = None
+    created_to: typing.Optional[datetime] = None
+    # ... other filters
+
+class StorageQuery(BaseModel):
+    page: int = 0
+    size: int = Field(8, ge=1, le=100)
+    asc: bool = True
+    sortby: typing.Optional[Literal["name", "enable", "created", "modified"]] = None
+    filter: StorageFilter = StorageFilter()
+
+# Response models
+class StorageResponse(BaseModel):
+    id: str
+    name: str
+    description: typing.Optional[str]
+    resource: dict
+    enable: bool
+    created: datetime
+    modified: datetime
+
+class ErrorResponse(BaseModel):
+    error_code: str
+    http_status: int
+    error_msg: str
+    x_user: typing.Optional[str] = None
+    details: typing.Optional[dict] = None
+```
+
+**C. Refactor `filter_or_insert.py`:**
+- Replace try-except to catch custom exceptions
+- Validate request body using Pydantic in `make_insert_request`
+- Use Pydantic for query parsing in `make_filter_request`
+- Add helper function `handle_exception` to format error responses consistently
+- Extract SQL queries into separate functions for testability
+- Add comprehensive docstrings explaining each endpoint's behavior
+
+**D. Refactor `update_or_delete.py`:**
+- Similar pattern: custom exceptions, Pydantic validation
+- Refactor `is_depended_on_storage` - this function does too much, split into smaller helpers
+- Add detailed comments for each database operation
+- Ensure proper error messages with appropriate HTTP status codes
+
+**E. Update `helpers.py`:**
+- Add utility `get_user_from_header(request)` to extract x-user from various headers
+
+---
+
+### 2. py-eom-quota
+
+**Current State:**
+- Already has extensive Pydantic models in `models.py` (QuotaPage, UserQuotaPage, ScheduleCreate, etc.)
+- But: `userquota_filter.py` imports from `quota_update_or_delete` which doesn't exist (broken import)
+- Need to expand models to cover all request/response scenarios
+- Endpoints: `/eom/admin/quotas` (filter), `/eom/admin/users/{UserId}/quotas` (filter/insert), `/eom/admin/users/{UserId}/quotas/{QuotaId}` (update/delete)
+
+**Changes Needed:**
+
+**A. Create `src/exceptions.py`:**
+```python
+class QuotaException(Exception):
+    """Base exception for quota management."""
+    def __init__(self, error_code: str, http_status: int, error_msg: str, x_user: str = None):
+        self.error_code = error_code
+        self.http_status = http_status
+        self.error_msg = error_msg
+        self.x_user = x_user
+        super().__init__(self.error_msg)
+
+class QuotaNotFoundError(QuotaException):
+    """Quota does not exist."""
+class UserQuotaConflictError(QuotaException):
+    """User already has this type of quota."""
+class ValidationError(QuotaException):
+    """Invalid request data."""
+class DatabaseError(QuotaException):
+    """Database operation failed."""
+```
+
+**B. Extend `src/models.py`:**
+The existing models mix schedule and quota models. Need to:
+- Separate or clearly document which are for quotas vs schedules
+- Add request models:
+  ```python
+  class QuotaCreateRequest(BaseModel):
+      name: str
+      description: typing.Optional[str] = None
+      type: QuotaType
+      value: typing.Union[MaxSizeBody, MaxOrderTimesBody]
+      expire: ExpireBody
+
+  class QuotaUpdateRequest(BaseModel):
+      name: typing.Optional[str] = None
+      description: typing.Optional[str] = None
+      enable: typing.Optional[bool] = None
+      type: typing.Optional[QuotaType] = None
+      value: typing.Optional[typing.Union[MaxSizeBody, MaxOrderTimesBody]] = None
+      expire: typing.Optional[ExpireBody] = None
+
+  class UserQuotaAssignRequest(BaseModel):
+      quota_id: str
+  ```
+
+- Ensure response models exist (QuotaResponse, UserQuotaResponse)
+
+**C. Fix `userquota_filter.py`:**
+- Fix broken import: `from quota_update_or_delete import __get_by_id` → `from userquota_insert_or_delete import __get_by_id` (or better: move `__get_by_id` to a shared helpers module)
+- Refactor `make_filter_request`:
+  - Use `UserQuotaPage` Pydantic model properly
+  - Validate user_id header is present using Pydantic
+  - Replace try-except with custom exceptions
+  - Add comprehensive docstring
+  - The function currently manually sets `paging.filter.user_ids = [user_id]` - this should be part of a validation layer
+
+**D. Refactor `userquota_insert_or_delete.py`:**
+- Fix the same broken import pattern (it imports nothing but uses `__get_by_id` in filter)
+- Add proper request validation using Pydantic models
+- Replace generic exceptions with `UserQuotaConflictError`, `QuotaNotFoundError`, etc.
+- Refactor `__validate_user_quota_type` - currently SQL query is hardcoded, add comments explaining business logic
+- The insert SQL has wrong columns: `INSERT INTO eom_user_quota(id, name, description, type, value, expire)` but the table likely only has (id, user_id, quota_id). Need to check database schema but from the code it seems mismatched.
+
+**E. Improve `helpers.py`:**
+- Add utility functions for extracting and validating user headers
+- Add consistent error handling helpers
+
+---
+
+### 3. py-ailbl-scheduler
+
+**Current State:**
+- No HTTP endpoints (only time-triggered workers)
+- No Pydantic models needed per user's choice
+- Needs custom exceptions and code quality improvements
+- Workers: `worker_session_picker.py`, `worker_session_poller.py`, `worker_scheduler_scan.py`, `worker_schedule_auto_disable.py`
+- Common utilities in `common.py`, `helpers.py`
+
+**Changes Needed:**
+
+**A. Create `src/exceptions.py`:**
+```python
+class SchedulerException(Exception):
+    """Base exception for scheduler operations."""
+    def __init__(self, error_code: str, error_msg: str, details: dict = None):
+        self.error_code = error_code
+        self.error_msg = error_msg
+        self.details = details
+        super().__init__(self.error_msg)
+
+class ScheduleNotFoundError(SchedulerException):
+    """Schedule does not exist."""
+class SessionLockError(SchedulerError):
+    """Failed to acquire session lock."""
+class DagsterError(SchedulerError):
+    """Dagster pipeline execution failed."""
+class CronParseError(SchedulerError):
+    """Invalid cron expression."""
+class ConfigurationError(SchedulerError):
+    """Missing or invalid configuration."""
+```
+
+**B. Refactor `worker_scheduler_scan.py`:**
+This is the most complex function (446 lines). Goals:
+- Extract helper functions:
+  - `_normalize_cron_for_cronner` (already exists)
+  - `_as_date`, `_as_time` (already exist)
+  - `_within_active_window` (already exists)
+  - `_is_due_by_cron` (already exists)
+  - `_is_due_by_freq` (already exists)
+  - Extract the schedule creation logic into `_create_session_for_schedule(cur, schedule, now, slot_start)`
+  - Extract the candidate schedule selection into `_fetch_due_schedules(cur, now, slot_start, slot_end, limit=50)`
+- Add detailed docstrings explaining the overall algorithm: "Scan for schedules that are due in the current time slot and create sessions atomically"
+- Improve variable names (e.g., `s` → `schedule`, `cur` → `cursor`)
+- Add comments explaining the advisory lock strategy and why it's needed
+- Ensure proper exception handling with custom exceptions
+- The function currently catches generic Exception at the end - wrap specific operations with appropriate custom exceptions
+
+**C. Refactor `worker_session_picker.py`:**
+- Similar breakdown: extract `_pick_and_claim_sessions(conn, limit=20)` helper
+- Extract `_process_kind5_session(session, ctx)` and `_process_kind1_session(session, ctx)` into separate functions
+- Add detailed docstring explaining the picking strategy (FOR UPDATE SKIP LOCKED)
+- Replace bare `except Exception` with specific exception types
+- Add comments explaining the kind handling logic (kind 5 vs kind 1)
+- The function `_build_run_config_kind5` is specific to that kind - could be moved to a separate module if needed
+
+**D. Refactor `worker_session_poller.py`:**
+- Extract `_update_completed_session(cur, session_id, status_info, now)` helper
+- Extract `_update_started_session(cur, session_id, started_dt)` helper
+- Add docstring explaining polling strategy
+- Replace generic exception handling with `DagsterError` when Dagster calls fail
+- Add type hints for the row unpacking: `for sid, run_id, started, cron_description, created_by in rows:`
+
+**E. Refactor `worker_schedule_auto_disable.py`:**
+- This is simple enough already but still add comprehensive docstring
+- Consider adding custom exception for database errors
+
+**F. Improve `helpers.py` (in scheduler):**
+- The `GraphQL` class and related functions are specific to Dagster - add docstrings
+- `safe_notify` is good, add docstring
+- Consider creating a `SchedulerHelper` class to group related utilities
+
+**G. Improve `common.py`:**
+- Already has good docstrings but could be expanded
+- Add type hints to function signatures
+- Break `launch_pipeline_execution` if too complex (handles multiple error cases)
+
+---
+
+## Common Patterns
+
+### Exception Hierarchy
+
+Each project will have:
+```python
+class BaseProjectException(Exception):
+    """Base with error_code, http_status (if applicable), message, metadata."""
+    pass
+
+# Specific exceptions inherit from base
+class NotFoundError(BaseProjectException): ...
+class ValidationError(BaseProjectException): ...
+class ConflictError(BaseProjectException): ...
+class DatabaseError(BaseProjectException): ...
+# Domain-specific: StorageNotFoundError, QuotaConflictError, ScheduleNotFoundError, etc.
+```
+
+### Error Response Format
+
+Standardized JSON response:
+```json
+{
+  "error_code": "STORAGE_NOT_FOUND",
+  "http_status": 404,
+  "error_msg": "Storage with id 'xyz' does not exist",
+  "x_user": "user123",
+  "details": { /* optional additional context */ }
+}
+```
+
+### Middleware Pattern
+
+In each HTTP endpoint function:
+```python
+def main():
+    try:
+        # Extract user header
+        x_user = request.headers.get("X-Fission-Params-UserId")
+        # Route to handler
+        return handler()
+    except ValidationError as e:
+        return error_response(e), 400
+    except NotFoundError as e:
+        return error_response(e), 404
+    except ConflictError as e:
+        return error_response(e), 409
+    except StorageException as e:
+        logger.error(f"Storage error: {e.error_code}: {e.error_msg}")
+        return error_response(e), 500
+    except Exception as e:
+        logger.exception("Unexpected error")
+        return {"error": "Internal server error"}, 500
+```
+
+---
+
+## Implementation Order
+
+1. **Phase 1**: Create exception modules for all three projects
+2. **Phase 2**: Add/expand Pydantic models (storage, then complete quota)
+3. **Phase 3**: Refactor endpoints to use exceptions and models
+4. **Phase 4**: Refactor complex functions in scheduler
+5. **Phase 5**: Documentation pass - ensure all functions have docstrings
+6. **Phase 6**: Test manually by running functions (no automated tests to update)
+
+---
+
+## Verification Steps
+
+1. **Manual Testing**:
+   - Deploy each function to local Fission or use test environment
+   - Test error cases: invalid input, missing resources, database failures
+   - Verify error response format matches specification
+   - Check logs for proper error logging
+
+2. **Code Review**:
+   - All functions have docstrings with Args, Returns, Raises sections
+   - No function exceeds ~50 lines (extracted helpers where needed)
+   - All exceptions are specific, not generic `Exception`
+   - Request validation happens before business logic
+
+3. **Import Verification**:
+   - Fix broken imports (especially in py-eom-quota's userquota_filter.py)
+   - Ensure circular dependencies are avoided
+
+4. **Type Safety**:
+   - Run static type checker if available (mypy/pyright)
+   - Ensure all functions have return type hints
+
+---
+
+## Critical Files to Modify
+
+**py-eom-storage:**
+- `src/exceptions.py` (new)
+- `src/models.py` (create/extend)
+- `src/filter_or_insert.py` (refactor)
+- `src/update_or_delete.py` (refactor)
+- `src/helpers.py` (add utilities)
+- `src/vault.py` (minor: improve docs)
+
+**py-eom-quota:**
+- `src/exceptions.py` (new)
+- `src/models.py` (extend with request models)
+- `src/userquota_filter.py` (fix imports, refactor)
+- `src/userquota_insert_or_delete.py` (refactor, fix SQL if needed)
+- `src/helpers.py` (add utilities)
+
+**py-ailbl-scheduler:**
+- `src/exceptions.py` (new)
+- `src/worker_scheduler_scan.py` (major refactor)
+- `src/worker_session_picker.py` (refactor)
+- `src/worker_session_poller.py` (refactor)
+- `src/worker_schedule_auto_disable.py` (docs)
+- `src/common.py` (docs, type hints)
+- `src/helpers.py` (docs, maybe extract class)
+
+---
+
+## Notes
+
+- All changes are in `/workspaces/claude-marketplace/data/examples/`
+- Preserve existing API contracts (URLs, HTTP methods)
+- Do not change database schema
+- Maintain backward compatibility with existing clients
+- Focus on internal improvements: error handling, validation, documentation
+- Use consistent patterns across all three projects