Files
ALwrity/docs/Billing_Subscription/LOG_STORAGE_AND_RETENTION_REVIEW.md

11 KiB

Log Storage and Retention Review

Executive Summary

This document reviews the storage limits, retention policies, and log management mechanisms for:

  1. API Usage Logs (api_usage_logs table)
  2. Subscription Renewal History (subscription_renewal_history table)

1. API Usage Logs

Current Storage Limits

Per-User Limit:

  • Maximum Logs Per User: 5,000 logs (defined in LogWrappingService.MAX_LOGS_PER_USER)
  • Detailed Logs Kept: 4,000 most recent logs
  • Aggregation Threshold: Logs older than 30 days OR beyond the 4,000 limit are aggregated

API Query Limits:

  • Frontend Default: 50 logs per page (configurable: 10, 25, 50, 100)
  • Backend Maximum: 5,000 logs per query (limit parameter: ge=1, le=5000)
  • Pagination: Fully supported with offset and limit parameters

Log Wrapping/Aggregation Mechanism

Service: LogWrappingService (backend/services/subscription/log_wrapping_service.py)

How It Works:

  1. Automatic Check: Triggered on every /usage-logs API call via check_and_wrap_logs()
  2. Threshold Detection: When user exceeds 5,000 logs
  3. Aggregation Strategy:
    • Keeps most recent 4,000 logs as detailed records
    • Aggregates oldest logs beyond 4,000 limit
    • Groups by provider and billing period
    • Creates aggregated log entries with:
      • Total counts, tokens, costs
      • Average response time
      • Success/failure counts
      • Time range (oldest to newest timestamp)
    • Deletes individual logs that were aggregated

Aggregated Log Format:

  • endpoint: "[AGGREGATED]"
  • method: "AGGREGATED"
  • model_used: "[{count} calls aggregated]"
  • error_message: Contains summary (e.g., "Aggregated 150 calls: 145 success, 5 failed")
  • is_aggregated: Flag set to true in frontend

Context Preservation:

  • Preserved: Total costs, tokens, call counts, success/failure rates, time ranges
  • Preserved: Provider and billing period grouping
  • Preserved: Average response time
  • Lost: Individual endpoint details, specific error messages, request/response sizes

Current Implementation Status

Implemented:

  • Automatic log wrapping when limit exceeded
  • Aggregation by provider and billing period
  • Context preservation for aggregated data
  • Frontend display of aggregated logs with special formatting

⚠️ Potential Issues:

  1. No Time-Based Retention: Only count-based, not age-based cleanup
  2. No Manual Cleanup Script: No scheduled job to clean very old logs
  3. Database Growth: Aggregated logs still count toward the 5,000 limit
  4. No Archive Strategy: No mechanism to move old logs to archive tables

Recommendations

  1. Add Time-Based Retention:

    • Archive logs older than 12 months
    • Keep aggregated logs for 24 months
    • Delete logs older than 24 months
  2. Improve Aggregation Strategy:

    • Consider aggregating by month for logs older than 90 days
    • Create separate archive table for very old logs
    • Implement tiered storage (hot/warm/cold)
  3. Add Cleanup Script:

    • Scheduled job to run monthly
    • Archive old logs before deletion
    • Maintain audit trail

2. Subscription Renewal History

Current Storage Limits

Per-User Limit:

  • No Hard Limit: Unlimited storage (no cleanup/aggregation)
  • API Query Limit: Maximum 100 records per query (limit parameter: ge=1, le=100)
  • Frontend Default: 20 records per page (configurable: 10, 20, 50, 100)

Storage Characteristics:

  • One record per renewal/upgrade/downgrade event
  • Includes usage snapshot before renewal (usage_before_renewal JSON field)
  • Includes payment information
  • Includes period information (start/end dates)

Current Implementation Status

Implemented:

  • Full history tracking for all subscription events
  • Usage snapshots preserved in JSON format
  • Pagination support
  • No automatic cleanup (preserves all history)

⚠️ Potential Issues:

  1. Unlimited Growth: No retention policy - will grow indefinitely
  2. Large JSON Snapshots: usage_before_renewal can be large for active users
  3. No Archive Strategy: All records kept in primary table
  4. No Cleanup Script: No mechanism to archive old records

Recommendations

  1. Add Retention Policy:

    • Keep detailed records for last 24 months
    • Archive records older than 24 months
    • Keep summary records (without full usage snapshots) for 7 years (tax/audit)
  2. Optimize Storage:

    • Compress usage_before_renewal JSON for old records
    • Create summary table for very old records
    • Remove detailed usage snapshots after 12 months
  3. Add Cleanup Script:

    • Monthly job to archive records older than 24 months
    • Maintain summary records for compliance
    • Preserve payment information indefinitely

3. Log Replay Mechanism

Current Status

No Log Replay: There is no mechanism to replay or reconstruct usage from logs.

What Would Be Needed:

  1. Event Sourcing Pattern: Store events that can be replayed
  2. Replay Service: Service to process logs and rebuild state
  3. State Reconstruction: Ability to rebuild UsageSummary from APIUsageLog entries

Current Data Flow

API Call → monitoring_middleware → UsageTrackingService.track_api_usage()
  ↓
APIUsageLog (individual record)
  ↓
UsageSummary (aggregated by billing period)

Issue: If UsageSummary is corrupted or lost, it cannot be fully reconstructed from APIUsageLog because:

  • Aggregation happens in real-time
  • No event sourcing pattern
  • No replay mechanism

Recommendations

  1. Add Replay Capability:

    • Create replay_usage_logs() function in UsageTrackingService
    • Rebuild UsageSummary from APIUsageLog entries
    • Support replay for specific billing periods
  2. Add Validation:

    • Periodic job to validate UsageSummary against APIUsageLog
    • Detect discrepancies and auto-correct
    • Alert on data inconsistencies
  3. Consider Event Sourcing (Future):

    • Store events instead of just logs
    • Enable full state reconstruction
    • Support time-travel queries

4. Summary and Action Items

Current State

Metric API Usage Logs Renewal History
Per-User Limit 5,000 logs Unlimited
Aggregation Yes (automatic) No
Retention Policy ⚠️ Count-based only None
Cleanup Script No No
Log Replay No No
Archive Strategy No No

Priority Actions

High Priority:

  1. Log Wrapping Works: Already implemented and functional
  2. ⚠️ Add Time-Based Retention: Implement age-based cleanup for API logs
  3. ⚠️ Add Renewal History Retention: Implement retention policy for renewal history

Medium Priority: 4. Add Cleanup Scripts: Create scheduled jobs for both tables 5. Add Archive Tables: Create archive tables for old data 6. Add Replay Capability: Enable reconstruction of UsageSummary from logs

Low Priority: 7. Optimize Storage: Compress JSON fields, optimize indexes 8. Add Monitoring: Alert on storage growth, aggregation events 9. Documentation: Document retention policies for users

Code Locations

Log Wrapping:

  • backend/services/subscription/log_wrapping_service.py
  • Triggered in: backend/api/subscription/routes/logs.py (line 86-89)

Usage Logs API:

  • backend/api/subscription/routes/logs.py
  • Frontend: frontend/src/components/billing/UsageLogsTable.tsx

Renewal History API:

  • backend/api/subscription/routes/subscriptions.py (line 519-586)
  • Frontend: frontend/src/components/billing/SubscriptionRenewalHistory.tsx

Models:

  • backend/models/subscription_models.py
    • APIUsageLog (line 127-173)
    • SubscriptionRenewalHistory (line 341-389)

API Usage Logs

┌─────────────────────────────────────────────────────────────┐
│ Retention Policy: API Usage Logs                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│ 0-30 days:     Detailed logs (all fields)                   │
│ 30-90 days:    Detailed logs (keep 4,000 most recent)       │
│ 90-365 days:   Aggregated by month                          │
│ 365-730 days:  Aggregated by quarter                       │
│ 730+ days:     Archive to separate table                    │
│                                                              │
│ Max per user:  5,000 records (detailed + aggregated)        │
│ Archive table: Unlimited (for compliance/audit)            │
└─────────────────────────────────────────────────────────────┘

Subscription Renewal History

┌─────────────────────────────────────────────────────────────┐
│ Retention Policy: Renewal History                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│ 0-12 months:   Full records with usage snapshots             │
│ 12-24 months:  Full records (compressed snapshots)          │
│ 24-84 months:  Summary records (no usage snapshots)        │
│ 84+ months:    Archive to separate table                    │
│                                                              │
│ Payment data:  Keep indefinitely (tax/audit compliance)    │
│ Usage snapshots: Remove after 12 months                     │
└─────────────────────────────────────────────────────────────┘

6. Implementation Plan

Phase 1: Immediate (No Breaking Changes)

  1. Document current behavior
  2. Add monitoring/alerts for log counts
  3. Add database indexes for performance

Phase 2: Retention Policies (Backward Compatible)

  1. Add time-based retention to log wrapping
  2. Create archive tables
  3. Add cleanup scripts (manual execution)

Phase 3: Automation

  1. Schedule cleanup jobs (cron/scheduler)
  2. Add replay capability
  3. Add validation/audit jobs

Phase 4: Optimization

  1. Compress JSON fields
  2. Optimize queries with better indexes
  3. Add caching for frequently accessed data