11 KiB
Log Storage and Retention Review
Executive Summary
This document reviews the storage limits, retention policies, and log management mechanisms for:
- API Usage Logs (
api_usage_logstable) - Subscription Renewal History (
subscription_renewal_historytable)
1. API Usage Logs
Current Storage Limits
Per-User Limit:
- Maximum Logs Per User:
5,000logs (defined inLogWrappingService.MAX_LOGS_PER_USER) - Detailed Logs Kept:
4,000most recent logs - Aggregation Threshold: Logs older than 30 days OR beyond the 4,000 limit are aggregated
API Query Limits:
- Frontend Default: 50 logs per page (configurable: 10, 25, 50, 100)
- Backend Maximum: 5,000 logs per query (
limitparameter:ge=1, le=5000) - Pagination: Fully supported with
offsetandlimitparameters
Log Wrapping/Aggregation Mechanism
Service: LogWrappingService (backend/services/subscription/log_wrapping_service.py)
How It Works:
- Automatic Check: Triggered on every
/usage-logsAPI call viacheck_and_wrap_logs() - Threshold Detection: When user exceeds 5,000 logs
- Aggregation Strategy:
- Keeps most recent 4,000 logs as detailed records
- Aggregates oldest logs beyond 4,000 limit
- Groups by provider and billing period
- Creates aggregated log entries with:
- Total counts, tokens, costs
- Average response time
- Success/failure counts
- Time range (oldest to newest timestamp)
- Deletes individual logs that were aggregated
Aggregated Log Format:
endpoint:"[AGGREGATED]"method:"AGGREGATED"model_used:"[{count} calls aggregated]"error_message: Contains summary (e.g., "Aggregated 150 calls: 145 success, 5 failed")is_aggregated: Flag set totruein frontend
Context Preservation:
- ✅ Preserved: Total costs, tokens, call counts, success/failure rates, time ranges
- ✅ Preserved: Provider and billing period grouping
- ✅ Preserved: Average response time
- ❌ Lost: Individual endpoint details, specific error messages, request/response sizes
Current Implementation Status
✅ Implemented:
- Automatic log wrapping when limit exceeded
- Aggregation by provider and billing period
- Context preservation for aggregated data
- Frontend display of aggregated logs with special formatting
⚠️ Potential Issues:
- No Time-Based Retention: Only count-based, not age-based cleanup
- No Manual Cleanup Script: No scheduled job to clean very old logs
- Database Growth: Aggregated logs still count toward the 5,000 limit
- No Archive Strategy: No mechanism to move old logs to archive tables
Recommendations
-
Add Time-Based Retention:
- Archive logs older than 12 months
- Keep aggregated logs for 24 months
- Delete logs older than 24 months
-
Improve Aggregation Strategy:
- Consider aggregating by month for logs older than 90 days
- Create separate archive table for very old logs
- Implement tiered storage (hot/warm/cold)
-
Add Cleanup Script:
- Scheduled job to run monthly
- Archive old logs before deletion
- Maintain audit trail
2. Subscription Renewal History
Current Storage Limits
Per-User Limit:
- No Hard Limit: Unlimited storage (no cleanup/aggregation)
- API Query Limit: Maximum 100 records per query (
limitparameter:ge=1, le=100) - Frontend Default: 20 records per page (configurable: 10, 20, 50, 100)
Storage Characteristics:
- One record per renewal/upgrade/downgrade event
- Includes usage snapshot before renewal (
usage_before_renewalJSON field) - Includes payment information
- Includes period information (start/end dates)
Current Implementation Status
✅ Implemented:
- Full history tracking for all subscription events
- Usage snapshots preserved in JSON format
- Pagination support
- No automatic cleanup (preserves all history)
⚠️ Potential Issues:
- Unlimited Growth: No retention policy - will grow indefinitely
- Large JSON Snapshots:
usage_before_renewalcan be large for active users - No Archive Strategy: All records kept in primary table
- No Cleanup Script: No mechanism to archive old records
Recommendations
-
Add Retention Policy:
- Keep detailed records for last 24 months
- Archive records older than 24 months
- Keep summary records (without full usage snapshots) for 7 years (tax/audit)
-
Optimize Storage:
- Compress
usage_before_renewalJSON for old records - Create summary table for very old records
- Remove detailed usage snapshots after 12 months
- Compress
-
Add Cleanup Script:
- Monthly job to archive records older than 24 months
- Maintain summary records for compliance
- Preserve payment information indefinitely
3. Log Replay Mechanism
Current Status
❌ No Log Replay: There is no mechanism to replay or reconstruct usage from logs.
What Would Be Needed:
- Event Sourcing Pattern: Store events that can be replayed
- Replay Service: Service to process logs and rebuild state
- State Reconstruction: Ability to rebuild
UsageSummaryfromAPIUsageLogentries
Current Data Flow
API Call → monitoring_middleware → UsageTrackingService.track_api_usage()
↓
APIUsageLog (individual record)
↓
UsageSummary (aggregated by billing period)
Issue: If UsageSummary is corrupted or lost, it cannot be fully reconstructed from APIUsageLog because:
- Aggregation happens in real-time
- No event sourcing pattern
- No replay mechanism
Recommendations
-
Add Replay Capability:
- Create
replay_usage_logs()function inUsageTrackingService - Rebuild
UsageSummaryfromAPIUsageLogentries - Support replay for specific billing periods
- Create
-
Add Validation:
- Periodic job to validate
UsageSummaryagainstAPIUsageLog - Detect discrepancies and auto-correct
- Alert on data inconsistencies
- Periodic job to validate
-
Consider Event Sourcing (Future):
- Store events instead of just logs
- Enable full state reconstruction
- Support time-travel queries
4. Summary and Action Items
Current State
| Metric | API Usage Logs | Renewal History |
|---|---|---|
| Per-User Limit | 5,000 logs | Unlimited |
| Aggregation | ✅ Yes (automatic) | ❌ No |
| Retention Policy | ⚠️ Count-based only | ❌ None |
| Cleanup Script | ❌ No | ❌ No |
| Log Replay | ❌ No | ❌ No |
| Archive Strategy | ❌ No | ❌ No |
Priority Actions
High Priority:
- ✅ Log Wrapping Works: Already implemented and functional
- ⚠️ Add Time-Based Retention: Implement age-based cleanup for API logs
- ⚠️ Add Renewal History Retention: Implement retention policy for renewal history
Medium Priority: 4. Add Cleanup Scripts: Create scheduled jobs for both tables 5. Add Archive Tables: Create archive tables for old data 6. Add Replay Capability: Enable reconstruction of UsageSummary from logs
Low Priority: 7. Optimize Storage: Compress JSON fields, optimize indexes 8. Add Monitoring: Alert on storage growth, aggregation events 9. Documentation: Document retention policies for users
Code Locations
Log Wrapping:
backend/services/subscription/log_wrapping_service.py- Triggered in:
backend/api/subscription/routes/logs.py(line 86-89)
Usage Logs API:
backend/api/subscription/routes/logs.py- Frontend:
frontend/src/components/billing/UsageLogsTable.tsx
Renewal History API:
backend/api/subscription/routes/subscriptions.py(line 519-586)- Frontend:
frontend/src/components/billing/SubscriptionRenewalHistory.tsx
Models:
backend/models/subscription_models.pyAPIUsageLog(line 127-173)SubscriptionRenewalHistory(line 341-389)
5. Recommended Retention Policies
API Usage Logs
┌─────────────────────────────────────────────────────────────┐
│ Retention Policy: API Usage Logs │
├─────────────────────────────────────────────────────────────┤
│ │
│ 0-30 days: Detailed logs (all fields) │
│ 30-90 days: Detailed logs (keep 4,000 most recent) │
│ 90-365 days: Aggregated by month │
│ 365-730 days: Aggregated by quarter │
│ 730+ days: Archive to separate table │
│ │
│ Max per user: 5,000 records (detailed + aggregated) │
│ Archive table: Unlimited (for compliance/audit) │
└─────────────────────────────────────────────────────────────┘
Subscription Renewal History
┌─────────────────────────────────────────────────────────────┐
│ Retention Policy: Renewal History │
├─────────────────────────────────────────────────────────────┤
│ │
│ 0-12 months: Full records with usage snapshots │
│ 12-24 months: Full records (compressed snapshots) │
│ 24-84 months: Summary records (no usage snapshots) │
│ 84+ months: Archive to separate table │
│ │
│ Payment data: Keep indefinitely (tax/audit compliance) │
│ Usage snapshots: Remove after 12 months │
└─────────────────────────────────────────────────────────────┘
6. Implementation Plan
Phase 1: Immediate (No Breaking Changes)
- Document current behavior
- Add monitoring/alerts for log counts
- Add database indexes for performance
Phase 2: Retention Policies (Backward Compatible)
- Add time-based retention to log wrapping
- Create archive tables
- Add cleanup scripts (manual execution)
Phase 3: Automation
- Schedule cleanup jobs (cron/scheduler)
- Add replay capability
- Add validation/audit jobs
Phase 4: Optimization
- Compress JSON fields
- Optimize queries with better indexes
- Add caching for frequently accessed data