5.3 KiB
5.3 KiB
title, updated
| title | updated |
|---|---|
| SIF and AI Tools model LLM choices | 2026-03-11 |
SIF and AI Tools model LLM choices
This document captures the intended LLM/provider split between:
- Premium AI tools (podcast, story writer, blog writer, etc.)
- SIF / agents (local-first intelligence workflows)
It also records recent fixes, root causes, and consolidation next steps.
1) Design Intent (Target Behavior)
A) Premium AI Tools
Use remote premium API path by default.
- Primary provider route: Hugging Face router
- Preferred premium model:
openai/gpt-oss-120b:groq GPT_PROVIDERvalues that should map to this premium remote text route:huggingfacehfhf_response_apiwavespeed(alias mapping for premium remote route)
Fallback policy for premium tools:
- Keep fallback minimal and explicit.
- Do not accidentally inherit SIF low-cost fallback chains.
- If provider is explicitly pinned per call (
preferred_provider), avoid cross-provider switching to reduce noisy retries and cost/time waste.
B) SIF / Agents
Use local-first strategy.
- Primary: local models (where SIF pipeline supports them)
- Fallback: smaller remote models (HF + environment-guided provider logic)
- Explicit low-cost model lists should be passed by SIF wrappers (e.g.,
preferred_hf_models) to keep these flows distinct from premium tools.
2) Current Routing Contract in llm_text_gen
llm_text_gen(...) now supports explicit context signals:
preferred_provider: pin provider intent for tool-specific flowspreferred_hf_models: low-cost model list for SIF/agent fallback usageflow_type: diagnostic tag (premium_toolvssif_agent)
Flow separation rule
- If
preferred_hf_modelsis used (SIF path), that list drives HF model selection/fallback. - Premium tool calls should not pass SIF low-cost lists.
Diagnostics
Logs include:
[llm_text_gen][flow_type=premium_tool] ...[llm_text_gen][flow_type=sif_agent] ...
This makes mixed routing issues visible immediately.
3) Key Issues Found and Fixes Applied
Issue A: Premium/SIF behavior got mixed
Symptoms:
- premium calls iterating through low-cost fallback chains
- noisy model-not-found logs
- wasted latency and confusion over routing
Fix:
- made fallback model chain caller-controlled
- kept SIF-specific fallback models passed only from SIF wrappers
- kept premium calls separate and explicitly tagged
Issue B: Podcast bible generation error (NoneType callable)
Symptoms:
services.podcast_bible_service:generate_bible -> 'NoneType' object is not callable
Root cause:
- personalization session acquisition/payload handling edge cases
Fix:
- safe DB session retrieval via user-scoped session function
- non-dict guardrails for integrated payload/canonical profile
- fallback to defaults instead of crashing
Issue C: Premium default model drift
Symptoms:
- premium default shifted to smaller model in recent patches
Fix:
- restored premium default model to:
openai/gpt-oss-120b:groq
- kept
wavespeedenv alias mapped to premium remote text route logic
4) Provider Notes
Hugging Face provider
- Accepts explicit
fallback_modelslist. - If
fallback_models=[], no broad fallback chain is injected beyond direct model variant handling.
Wavespeed
- Wavespeed services exist in codebase and are used for dedicated workloads.
- In text routing context (
llm_text_gen),GPT_PROVIDER=wavespeedis treated as an alias to premium remote text route (HF provider path), preserving current behavior without introducing a second text-provider implementation in this function.
5) Operational Validation Checklist
When testing /api/podcast/idea/enhance:
- Verify request log and auth token attachment in frontend.
- Verify backend log shows:
[llm_text_gen][flow_type=premium_tool] Using provider=huggingface, model=openai/gpt-oss-120b:groq
- Verify no SIF-specific low-cost model list is being used in this flow.
- Verify no repeated broad fallback cascades unless explicitly configured.
- Verify podcast bible generation does not crash and gracefully falls back to defaults if onboarding payload is malformed.
6) Consolidation Next Steps
-
Centralize routing policy constants
- define premium defaults and SIF defaults in one module
- avoid drift from scattered hardcoded model strings
-
Add explicit
route_intentenum (optional)premium_tool,sif_local_first,sif_remote_fallback- reduce ambiguity vs inferred behavior
-
Add unit tests for routing matrix
- test combinations of:
GPT_PROVIDERpreferred_providerpreferred_hf_models- key presence/absence
- test combinations of:
-
Add structured log fields
route_intent,provider_selected,model_selected,fallback_count- easier production RCA
-
Document model availability assumptions
- account-level HF router model availability differs across keys/orgs
- include fallback policy per environment (dev/staging/prod)
7) Practical Rule of Thumb
- If the caller is a premium AI tool: call with premium provider intent and avoid SIF low-cost list.
- If the caller is SIF/agent: local-first, then explicitly pass low-cost remote fallback list.
- Keep these paths separate in code and logs.