ALwrity/docs/BACKEND_LOG_RCA_TRACKER.md

# Backend Log RCA Tracker

## Purpose

This document is the working catalog for backend issues observed in runtime logs.

For each issue, capture:
- error signature
- observed symptoms
- likely root cause analysis
- confidence level
- files to inspect/edit
- fix strategy notes
- validation steps
- status

## Triage Rules

- Do not fix directly from logs alone unless root cause is confirmed.
- Prefer grouping repeated log lines under one issue.
- Track the first failing subsystem, then downstream effects.
- Separate configuration problems from code defects.
- Keep this document updated before and after each fix.

## Issue 1: Clerk token verification failures on authenticated endpoints

- **Status**: Open
- **Severity**: High
- **Subsystem**: Authentication / request pipeline
- **Error signatures**:
  - `Unverified token rejected (production).`
  - `AUTHENTICATION ERROR: Token verification failed for endpoint: GET /api/...`
- **Observed endpoints in logs**:
  - `/api/content-planning/monitoring/lightweight-stats`
  - `/api/content-planning/monitoring/health`
  - `/api/subscription/dashboard/...`
  - `/api/subscription/alerts/...`
  - `/api/subscription/status/...`
- **Observed behavior**:
  - Requests reach authenticated endpoints.
  - Clerk verification fails.
  - Fallback unverified decode path is attempted.
  - Production mode rejects the token.
- **Primary RCA hypothesis**:
  - The backend is receiving bearer tokens that do not successfully validate against the resolved Clerk JWKS/issuer configuration.
  - The middleware then falls back to unverified decode, but production mode explicitly rejects that path.
- **Secondary RCA hypotheses**:
  - Frontend token/audience/issuer mismatch.
  - Wrong Clerk environment variables loaded in backend.
  - Issuer-derived JWKS URL resolution is inconsistent with actual Clerk instance.
  - Requests may be sent before a valid session token is available.
- **Evidence in code**:
  - `backend/middleware/auth_middleware.py`
    - `ClerkAuthMiddleware.__init__`
    - `ClerkAuthMiddleware.verify_token`
    - `get_current_user`
  - Relevant logic:
    - derives JWKS URL from token issuer or cached publishable key instance
    - falls back to `jwt.decode(..., verify_signature=False)`
    - rejects unverified tokens when `ALLOW_UNVERIFIED_JWT_DEV` is false
- **Likely files to inspect/edit later**:
  - `backend/middleware/auth_middleware.py`
  - possibly frontend auth/session request layer if token attachment is inconsistent
- **Confidence**: Medium
- **Root-cause questions to answer**:
  - Are `CLERK_SECRET_KEY` and publishable key values from the same Clerk instance?
  - Is the token issuer exactly matching the intended Clerk environment?
  - Are failing requests sent with stale, dev, or cross-environment tokens?
  - Are these requests triggered before Clerk session hydration on the frontend?
- **Validation after fix**:
  - Authenticated endpoints return 200 with verified user context.
  - No `Unverified token rejected (production)` log spam for healthy requests.

## Issue 2: Hugging Face structured JSON generation failing with model not found

- **Status**: Open
- **Severity**: High
- **Subsystem**: LLM provider / workflow generation
- **Error signatures**:
  - `HF structured model not found: %s. Trying fallback model.`
  - `Hugging Face API call failed: Not Found`
  - `HF structured model not found (no response_format path): %s`
  - `Hugging Face structured JSON generation failed: NotFoundError: Not Found`
  - `[llm_text_gen] Provider huggingface failed: RetryError[...]`
- **Observed behavior**:
  - Structured JSON call tries primary model.
  - Fallback model sequence also fails.
  - Retry without `response_format` still fails with `NotFound`.
  - Upstream caller falls through to another provider or fallback path.
- **Primary RCA hypothesis**:
  - The configured Hugging Face model identifier is invalid, unavailable to the account/provider, or incompatible with the current OpenAI-compatible Hugging Face endpoint.
- **Secondary RCA hypotheses**:
  - Base URL/API key/provider configuration is wrong.
  - Fallback model list contains provider-specific model ids not available in the current account/region.
  - Structured generation path assumes chat completions support for models that only exist on a different inference route.
- **Evidence in code**:
  - `backend/services/llm_providers/huggingface_provider.py`
    - `_fallback_model_sequence`
    - `huggingface_structured_json_response`
  - The code retries:
    - with `response_format={"type": "json_object"}`
    - then again without `response_format`
  - Both paths still fail with `NotFoundError`, which points more strongly to model/base-url availability than schema formatting.
- **Likely files to inspect/edit later**:
  - `backend/services/llm_providers/huggingface_provider.py`
  - provider selection/orchestration file calling Hugging Face as primary for structured JSON
  - environment/config file for HF model names and API base URL
- **Confidence**: High
- **Root-cause questions to answer**:
  - Which exact model string is being passed as the primary model in the failing call?
  - What base URL and API key are being used for the OpenAI client?
  - Are the fallback model ids valid for the currently configured Hugging Face inference provider?
- **Validation after fix**:
  - A structured JSON test request succeeds with the intended model or a verified fallback.
  - No `NotFoundError` for the chosen model list.

## Issue 3: txtai indexing attempted before service initialization completes

- **Status**: Open
- **Severity**: Medium
- **Subsystem**: Semantic indexing / background tasks
- **Error signatures**:
  - `Cannot index content - service not initialized for user ...`
- **Observed behavior**:
  - Background indexing is triggered.
  - `TxtaiIntelligenceService.index_content` calls `_ensure_initialized()`.
  - `_ensure_initialized()` starts background initialization and returns immediately.
  - `index_content` then checks `_initialized`, sees false, and fails fast.
- **Primary RCA hypothesis**:
  - There is a race condition between lazy background initialization and immediate indexing/search calls.
  - `SIF_FAIL_FAST=true` (default) causes operations to raise RuntimeError instead of gracefully deferring.
- **Evidence in code**:
  - `backend/services/intelligence/txtai_service.py`:
    - Line 57: `self.fail_fast = str(os.getenv("SIF_FAIL_FAST", "true")).lower() in {"1", "true", "yes", "on"}`
    - Lines 234-235: `index_content` raises RuntimeError if `fail_fast` and not initialized
    - Lines 284-285: `search` raises RuntimeError if `fail_fast` and not initialized
    - Lines 319-320: `get_similarity` raises RuntimeError if `fail_fast` and not initialized
    - `_ensure_initialized` is intentionally non-blocking (starts background thread)
  - `backend/api/today_workflow.py`:
    - `_index_tasks_to_sif` triggers indexing in background after workflow actions
- **Likely files to inspect/edit later**:
  - `backend/services/intelligence/txtai_service.py`
  - `backend/api/today_workflow.py`
  - any other callers that assume initialization is synchronous
- **Confidence**: High
- **Potential downstream impact**:
  - workflow/task indexing silently fails
  - semantic search quality degrades
  - noisy logs obscure higher-priority failures
- **Root-cause questions to answer**:
  - Should `index_content` await `_ensure_initialized_async()` instead of using the non-blocking path?
  - Should callers tolerate deferred indexing instead of fail-fast behavior?
  - Is `SIF_FAIL_FAST=true` appropriate for background indexing operations?
  - Should `SIF_FAIL_FAST` default to `false` for background operations?
- **Validation after fix**:
  - First indexing call after startup succeeds or is gracefully deferred without error spam.

## Issue 4: Today workflow endpoint reload observed during active debugging

- **Status**: Observed
- **Severity**: Low
- **Subsystem**: Development reload / workflow API
- **Log signature**:
  - `StatReload detected changes in 'api\today_workflow.py'. Reloading...`
- **Observed behavior**:
  - Development server reloads due to file edits.
- **RCA**:
  - Expected dev-server behavior, not itself a product bug.
- **Files involved**:
  - `backend/api/today_workflow.py`
- **Confidence**: High
- **Action**:
  - No fix needed; keep separate from actual runtime defects.

## Cross-Issue Notes

- The auth failures and the workflow/indexing issues may be independent.
- The Hugging Face failure may trigger fallback task generation, which can still create workflows while hiding the upstream provider problem.
- txtai indexing failures appear to be a post-generation side effect, not the root cause of generation failure.
- **LiteLLM was investigated and dropped as a false herring** – no project-level SIF/txtai wiring to LiteLLM was found.
- The SIF agent local-model path is **separate** from txtai embeddings and may be the source of the "local model used to work" feedback.

## Candidate Investigation Order

1. Authentication verification mismatch
2. Hugging Face model/provider availability mismatch
3. txtai initialization race (with `SIF_FAIL_FAST` behavior)
4. SIF agent local-model defaults (Qwen 1.5B vs lighter alternatives)
5. Any downstream workflow symptoms after the above are stabilized

## Minimal Fix Paths (Pre-Implementation)

### For Issue 3 (txtai init race):
- **Option A**: Change `SIF_FAIL_FAST` default to `false` for background operations
  - Allows graceful deferral instead of RuntimeError
  - Minimal code change, no logic changes
- **Option B**: Use `_ensure_initialized_async()` in `index_content`/`search`/`get_similarity`
  - Awaits initialization before proceeding
  - More robust but requires async refactoring
- **Option C**: Add initialization state callbacks to callers
  - More complex, may not be necessary

### For Issue 5 (SIF agent local-model drift):
- **Option A**: Change default `model_name` in `SIFBaseAgent.__init__` to lighter model
  - Example: `Qwen/Qwen2.5-0.5B-Instruct` or `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
  - Single-line change, immediate effect
- **Option B**: Add env/config override for default agent local model
  - More flexible, requires config wiring
  - Allows runtime tuning without code changes
- **Option C**: Keep current default and rely on existing fallback chain
  - The fallback chain already tries lighter models if memory fails
  - May be sufficient if memory detection works correctly

## Current Evidence Sources

- Runtime logs from terminal `python` process `22056`
- `backend/middleware/auth_middleware.py`
- `backend/services/llm_providers/huggingface_provider.py`
- `backend/services/intelligence/txtai_service.py`
- `backend/api/today_workflow.py`
- `backend/services/today_workflow_service.py`

## Issue 5: SIF agent local-model drift (distinct from txtai embeddings)

- **Status**: Open
- **Severity**: Medium
- **Subsystem**: SIF agents / local LLM wrappers
- **Error signatures**:
  - (No direct log signature yet; this is a hypothesis from user feedback that "local model used to work")
- **Observed behavior**:
  - User reports that a local model used to work for SIF agents now seems heavier or less responsive.
  - The SIF agent path is **separate** from txtai embeddings.
- **Primary RCA hypothesis**:
  - The SIF agent local LLM wrapper path uses a 1.5B parameter model by default, which may be heavier than the previous local model.
  - This is distinct from txtai embeddings, which still use `sentence-transformers/all-MiniLM-L6-v2`.
- **Evidence in code**:
  - `backend/services/intelligence/sif_agents.py`:
    - Lines 47-51: `LOCAL_LLM_FALLBACKS = ["Qwen/Qwen2.5-1.5B-Instruct", "Qwen/Qwen2.5-0.5B-Instruct", "TinyLlama/TinyLlama-1.1B-Chat-v1.0"]`
    - Lines 53-139: `LocalLLMWrapper` tries models in order, with memory issue detection and automatic fallback to smaller models
    - Line 141: `SIFBaseAgent.__init__` default `model_name="Qwen/Qwen2.5-1.5B-Instruct"`
  - `backend/services/intelligence/txtai_service.py`:
    - Line 48: Still uses `sentence-transformers/all-MiniLM-L6-v2` for embeddings
- **Likely files to inspect/edit later**:
  - `backend/services/intelligence/sif_agents.py`
  - `backend/services/intelligence/agents/specialized/base.py`
  - any config/env that controls default agent local model
- **Confidence**: Medium
- **Root-cause questions to answer**:
  - What was the previous local model default for SIF agents?
  - Is `Qwen/Qwen2.5-1.5B-Instruct` actually too heavy for the user’s laptop?
  - Should the default be changed to `Qwen/Qwen2.5-0.5B-Instruct` or `TinyLlama/TinyLlama-1.1B-Chat-v1.0`?
  - Are there any env/config overrides that could make this configurable?
- **Validation after fix**:
  - SIF agents use a CPU-friendly local model (e.g., smaller Qwen variant or TinyLlama).
  - Agent generation completes without excessive CPU/memory pressure.

## Issue 6: Model initialization blocking and module unification

- **Status**: Open
- **Severity**: High
- **Subsystem**: Startup / model loading / module architecture
- **Error signatures**:
  - (No direct log signature; architectural issue)
- **Observed behavior**:
  - `start_alwrity_backend.py` pre-downloads `Qwen/Qwen2.5-3B-Instruct` **synchronously** before server starts (line 122).
  - `sif_agents.py` defaults to `Qwen/Qwen2.5-1.5B-Instruct` and uses lazy loading via `LocalLLMWrapper`.
  - `txtai_service.py` uses `sentence-transformers/all-MiniLM-L6-v2` for embeddings.
  - Three separate modules handle model loading, creating confusion.
  - User wants fail-fast semantics (catch bugs, avoid silent failures) AND proper fallback.
  - User wants non-blocking model downloads for SIF/agents.
- **Primary RCA hypothesis**:
  - Startup script blocks on model download, contradicting non-blocking requirement.
  - Model size mismatch: startup downloads 3B, agents default to 1.5B.
  - Fail-fast in `txtai_service.py` prevents fallback from working.
  - Module separation (`txtai_service.py`, `sif_agents.py`, `start_alwrity_backend.py`) creates confusion.
- **Evidence in code**:
  - `start_alwrity_backend.py`:
    - Line 122: `target_model = "Qwen/Qwen2.5-3B-Instruct"`
    - Lines 127-131: `snapshot_download()` is **blocking** call
    - Lines 117-120: Skips on Render/Railway but **not** on local dev
  - `sif_agents.py`:
    - Line 48: `LOCAL_LLM_FALLBACKS = ["Qwen/Qwen2.5-1.5B-Instruct", "Qwen/Qwen2.5-0.5B-Instruct", "TinyLlama/TinyLlama-1.1B-Chat-v1.0"]`
    - Line 141: Default `model_name="Qwen/Qwen2.5-1.5B-Instruct"`
    - Line 150: Uses `LocalLLMWrapper` for lazy loading
    - Lines 94-130: Has fallback logic with memory issue detection
  - `txtai_service.py`:
    - Line 57: `SIF_FAIL_FAST=true` (default) causes RuntimeError
    - Lines 234-235, 284-285, 319-320: Fail-fast prevents fallback
- **Likely files to inspect/edit later**:
  - `start_alwrity_backend.py` (remove blocking download)
  - `services/intelligence/sif_agents.py` (unify model defaults)
  - `services/intelligence/txtai_service.py` (fix fail-fast with fallback)
  - Create unified `services/intelligence/model_registry.py` or similar
- **Confidence**: High
- **Root-cause questions to answer**:
  - Should model download be truly non-blocking (background thread)?
  - Should fail-fast be conditional (e.g., only for critical paths, not background ops)?
  - Should module unification create a single `ModelRegistry` or `ModelManager`?
  - How to ensure JSON/response structure compatibility across fallback chain?
- **Validation after fix**:
  - Server starts without blocking on model download.
  - SIF agents use consistent model defaults.
  - Fail-fast catches bugs but allows fallback for non-critical ops.
  - Single module handles all model loading logic.

## Minimal Fix Paths (Pre-Implementation)

### For Issue 3 (txtai init race) - REVISED:
- **Option A**: Change `SIF_FAIL_FAST` to be **conditional** (not global)
  - Keep fail-fast for critical paths (user-initiated ops)
  - Allow graceful deferral for background ops (indexing, clustering)
  - Requires distinguishing operation types
- **Option B**: Use `_ensure_initialized_async()` for **blocking ops only**
  - Keep non-blocking for background ops
  - Awaits init for user-facing ops
  - More robust but requires async refactoring
- **Option C**: Add operation-type-aware fail-fast
  - Pass `critical=True/False` to operations
  - Fail-fast only when `critical=True`
  - Most aligned with user requirements

### For Issue 5 (SIF agent local-model drift) - REVISED:
- **Option A**: Change default to lighter model AND improve fallback chain
  - Default: `Qwen/Qwen2.5-0.5B-Instruct` (lighter)
  - Fallback: `0.5B → TinyLlama 1.1B`
  - Ensure JSON/response structure compatibility
- **Option B**: Add env/config override + keep fallback chain
  - `SIF_AGENT_MODEL` env var
  - Fallback chain remains as-is
  - More flexible
- **Option C**: Keep current default and rely on existing fallback chain
  - **RECOMMENDED**: Already has memory detection and fallback
  - Just need to ensure JSON compatibility

### For Issue 6 (Model blocking + module unification):
- **Option A**: Remove blocking download from startup script
  - Delete `bootstrap_local_llm_models()` call
  - Let `LocalLLMWrapper` handle lazy loading
  - Minimal change, immediate non-blocking
- **Option B**: Make download non-blocking (background thread)
  - Keep pre-download but in background
  - Server starts immediately
  - More complex
- **Option C**: Create unified `ModelRegistry` module
  - Single source of truth for model defaults
  - Centralized download/cache logic
  - Eliminates confusion between modules
  - **RECOMMENDED for long-term**

## Session Update Log

### 2026-03-10

- Created initial RCA tracker document.
- Seeded first three concrete issues from supplied logs.
- No fixes applied from this document yet.
- Added Issue 5: SIF agent local-model drift (LiteLLM dropped as false herring).
- Refined Issue 3 with `SIF_FAIL_FAST` behavior details.
- Added minimal fix paths for Issues 3 and 5.
- Added Issue 6: Model initialization blocking and module unification.
- Updated minimal fix paths based on user requirements (fail-fast + fallback, non-blocking, unification).