6.4 KiB
Time-Based Retention Implementation for API Usage Logs
Overview
Implemented time-based retention for API usage logs in addition to the existing count-based retention. This ensures that logs older than a specified retention period are automatically aggregated, regardless of the total log count.
Implementation Details
Changes Made
File: backend/services/subscription/log_wrapping_service.py
1. Added Time-Based Retention Constant
RETENTION_DAYS = 90 # Time-based retention: aggregate logs older than 90 days
2. Enhanced check_and_wrap_logs() Method
Before: Only checked count-based limit (5,000 logs)
After: Checks both:
- Count-based: If user has more than 5,000 logs
- Time-based: If user has logs older than 90 days
Key Features:
- Detects logs older than retention period
- Excludes already aggregated logs from time-based checks
- Provides detailed trigger reasons in response
- Reports how many old logs were aggregated
3. Enhanced _wrap_old_logs() Method
New Parameters:
time_based: Boolean flag to prioritize time-based retention
Aggregation Strategy:
- Time-based mode: Aggregates ALL logs older than 90 days (excluding already aggregated)
- Count-based mode: Aggregates oldest logs beyond 4,000 limit
- Combined mode: When count-based is primary, also includes old logs to prevent keeping very old logs just because they're within count limit
Key Improvements:
- Prevents re-aggregation of already aggregated logs (
endpoint != '[AGGREGATED]') - Prioritizes old logs even in count-based mode
- Better logging for debugging and monitoring
How It Works
Automatic Triggering
The log wrapping is automatically triggered on every /usage-logs API call:
# In backend/api/subscription/routes/logs.py
wrapping_service = LogWrappingService(db)
wrap_result = wrapping_service.check_and_wrap_logs(user_id)
Retention Logic Flow
1. Check total log count
├─ If > 5,000 → Count-based trigger
└─ If ≤ 5,000 → Continue
2. Check for old logs (> 90 days)
├─ If found → Time-based trigger
└─ If none → No action needed
3. If either trigger active:
├─ Time-based: Aggregate ALL logs older than 90 days
├─ Count-based: Aggregate oldest logs beyond 4,000 limit
└─ Combined: Merge both sets (prioritize old logs)
4. Create aggregated records
├─ Group by provider + billing period
├─ Preserve: costs, tokens, counts, success rates
└─ Delete individual logs that were aggregated
Example Scenarios
Scenario 1: Time-Based Only
- User has 3,000 logs
- 500 logs are older than 90 days
- Result: 500 old logs aggregated, 2,500 detailed logs kept
Scenario 2: Count-Based Only
- User has 6,000 logs (all recent)
- Result: 2,000 oldest logs aggregated, 4,000 detailed logs kept
Scenario 3: Both Triggers
- User has 6,000 logs
- 1,000 logs are older than 90 days
- Result: All 1,000 old logs + 1,000 additional oldest logs aggregated, 4,000 detailed logs kept
Configuration
Retention Period
Currently set to 90 days. To change:
# In LogWrappingService class
RETENTION_DAYS = 90 # Change this value
Recommended Values:
- 90 days (current): Good balance for most use cases
- 60 days: More aggressive, faster aggregation
- 180 days: Less aggressive, keeps more detailed history
Count Limits
MAX_LOGS_PER_USER = 5000 # Total logs per user
logs_to_keep = 4000 # Detailed logs to keep
Response Format
The check_and_wrap_logs() method now returns enhanced information:
{
'wrapped': True,
'total_logs_before': 6000,
'total_logs_after': 4500,
'aggregated_logs': 1500,
'aggregated_periods': [...],
'trigger_reasons': [
'count limit (6000 > 5000)',
'time-based retention (500 logs older than 90 days)'
],
'old_logs_aggregated': 500,
'message': 'Wrapped 1500 logs into 12 aggregated records'
}
Benefits
- Automatic Cleanup: Old logs are automatically aggregated without manual intervention
- Storage Efficiency: Prevents indefinite growth of detailed logs
- Context Preservation: Aggregated logs maintain all important metrics
- Dual Protection: Both count and time limits ensure efficient storage
- No Data Loss: Historical data is preserved in aggregated form
Testing
Manual Testing
-
Create old logs (for testing, you can manually update timestamps in database):
UPDATE api_usage_logs SET timestamp = datetime('now', '-100 days') WHERE user_id = 'test_user' AND id IN (SELECT id FROM api_usage_logs LIMIT 10); -
Trigger wrapping by calling
/api/subscription/usage-logs -
Verify:
- Old logs are aggregated
- Aggregated logs have
endpoint = '[AGGREGATED]' - Total log count reduced
- Costs and tokens preserved in aggregated records
Expected Behavior
- Logs older than 90 days are automatically aggregated
- Aggregated logs are not re-aggregated
- Most recent 4,000 logs remain detailed
- All historical data is preserved in aggregated form
Monitoring
The service logs detailed information:
[LogWrapping] User {user_id} needs log wrapping. Total: 6000, Old logs: 500. Triggers: count limit, time-based retention
[LogWrapping] Time-based aggregation: Found 500 logs older than 90 days
[LogWrapping] Wrapped 1500 logs into 12 aggregated records. Remaining logs: 4500
Future Enhancements
- Configurable Retention: Make
RETENTION_DAYSconfigurable via environment variable - Tiered Retention: Different retention periods for different log types
- Archive Tables: Move very old aggregated logs to separate archive tables
- Scheduled Jobs: Run aggregation on a schedule instead of on-demand
- Metrics: Track aggregation statistics over time
Backward Compatibility
✅ Fully backward compatible:
- Existing count-based logic still works
- No breaking changes to API responses
- Old logs without
actual_provider_nameare handled correctly - Aggregated logs are properly identified and displayed
Related Files
backend/services/subscription/log_wrapping_service.py- Main implementationbackend/api/subscription/routes/logs.py- API endpoint that triggers wrappingfrontend/src/components/billing/UsageLogsTable.tsx- Frontend displaydocs/Billing_Subscription/LOG_STORAGE_AND_RETENTION_REVIEW.md- Review document