Add Step 5 flat context and txtai file tools for agents

This commit is contained in:
ي
2026-03-11 10:42:05 +05:30
parent b410ece4ca
commit cbe41ef8c7
13 changed files with 1480 additions and 7 deletions

View File

@@ -0,0 +1,140 @@
# Flat File Context Framework Design (Agent-Optimized)
## Purpose
Design a **compact, machine-first flat-file framework** for ALwrity AI agents.
This framework is optimized for:
- deterministic structure,
- minimal token footprint,
- fast parsing,
- high-signal retrieval,
- robust fallback behavior.
## Core Principles
1. **Agent-first, not human-first**
- Keys are short and stable.
- Avoid verbose prose in payloads.
- Include only fields needed for reasoning and tool actions.
2. **Compact + predictable schema**
- Fixed top-level keys in strict order.
- Canonical value types (no shape drift).
- Avoid polymorphic fields when possible.
3. **Dual-layer context**
- `d` (full normalized data for deep reasoning).
- `s` (summary/high-signal fast path for most agent reads).
4. **Fallback-safe design**
- Every context doc includes source + freshness metadata.
- If missing/stale, consumers fall back to DB then SIF semantic.
5. **Multi-tenant isolation**
- Per-user file under `workspace/workspace_<safe_user_id>/agent_context/`.
---
## Canonical Context Envelope (compact)
```json
{
"v": "1.0",
"t": "onboarding.step2.website_analysis",
"u": "<user_id>",
"ts": "<iso8601>",
"src": "onboarding_step2",
"d": {},
"s": {},
"m": {
"db": 0,
"sb": 0,
"q": []
}
}
```
### Field map
- `v`: schema version
- `t`: context type
- `u`: user id
- `ts`: updated timestamp
- `src`: source writer
- `d`: canonical normalized data
- `s`: high-signal summary for quick agent use
- `m`: meta (`db`=data bytes, `sb`=summary bytes, `q`=query hints)
---
## Agent Readability Best Practices
- Prefer enums/controlled vocab over free text.
- Use compact keys and arrays for repetitive entities.
- Truncate long textual blobs unless explicitly required.
- Keep “quick facts” flattened.
- Separate operational metadata from semantic content.
- Include retrieval hints (`q`) for consistent query drafting.
---
## Write Pipeline Pattern
1. Normalize incoming source payload.
2. Derive compact summary (`s`) from normalized data.
3. Compute lightweight metadata (`m`).
4. Atomic write JSON file.
5. Emit writer version + timestamp.
## Read Pipeline Pattern
1. Attempt flat-file load.
2. Validate minimum envelope fields (`v,t,u,ts,d`).
3. Prefer `s` for quick tasks; use `d` for deeper reasoning.
4. If invalid/missing/stale: fallback DB -> SIF semantic.
---
## Scope Expansion Pattern
Apply same envelope for:
- Step 2: website analysis
- Step 3: research preferences + competitor snapshots
- Step 4: persona profile + platform personas
Only `t`, `d`, and `s` payload contracts should vary.
---
## Governance
- Schema changes require version bump (`v`).
- Backward compatibility policy: readers support N and N-1.
- Drift checks should compare canonical hash/checksum vs DB latest row.
## Document Context + End-User Journey Metadata
Each context file should carry explicit machine-oriented document metadata so agents understand *what this file is* before reading full payloads.
Suggested `document_context` fields:
- `audience`: `ai_agents`
- `purpose`: `fast_context_retrieval`
- `context_type`: step-scoped type identifier
- `journey`: stage/action/agent expectation
- `retrieval_contract`: preferred source + fallback order
- `context_window_guidance`: byte budget and summary-first policy
This block is intentionally compact and deterministic to reduce wasted token usage for agent planning.
## Context Window and Length Policy
- Keep combined `data + summary` under a defined byte budget where practical.
- Enforce summary-first reads in agent consumers.
- Truncate long textual fields in summaries; keep full text only in `data` when needed.
- Flag oversize docs in metadata so readers can skip low-priority sections.
- Prefer short, stable keys in machine envelopes and avoid natural-language verbosity.
## Implemented baseline controls
- Atomic file writes to avoid partial documents.
- Best-effort restricted file permissions (`0600`).
- Recursive sensitive-key redaction for payload snapshots.
- Payload size budget enforcement with deterministic trimming metadata.
- Internal document linking via `related_documents` and manifest index.
Security and isolation details: `docs/flat_file_context/FLAT_FILE_CONTEXT_SECURITY_AND_ISOLATION.md`
Step docs: `docs/flat_file_context/STEP2_FLAT_FILE_CONTEXT_DESIGN.md`, `docs/flat_file_context/STEP3_FLAT_FILE_CONTEXT_DESIGN.md`, `docs/flat_file_context/STEP4_FLAT_FILE_CONTEXT_DESIGN.md`, `docs/flat_file_context/STEP5_FLAT_FILE_CONTEXT_DESIGN.md`