207 lines
6.4 KiB
Markdown
207 lines
6.4 KiB
Markdown
# Time-Based Retention Implementation for API Usage Logs
|
|
|
|
## Overview
|
|
|
|
Implemented time-based retention for API usage logs in addition to the existing count-based retention. This ensures that logs older than a specified retention period are automatically aggregated, regardless of the total log count.
|
|
|
|
## Implementation Details
|
|
|
|
### Changes Made
|
|
|
|
**File**: `backend/services/subscription/log_wrapping_service.py`
|
|
|
|
#### 1. Added Time-Based Retention Constant
|
|
|
|
```python
|
|
RETENTION_DAYS = 90 # Time-based retention: aggregate logs older than 90 days
|
|
```
|
|
|
|
#### 2. Enhanced `check_and_wrap_logs()` Method
|
|
|
|
**Before**: Only checked count-based limit (5,000 logs)
|
|
|
|
**After**: Checks both:
|
|
- **Count-based**: If user has more than 5,000 logs
|
|
- **Time-based**: If user has logs older than 90 days
|
|
|
|
**Key Features**:
|
|
- Detects logs older than retention period
|
|
- Excludes already aggregated logs from time-based checks
|
|
- Provides detailed trigger reasons in response
|
|
- Reports how many old logs were aggregated
|
|
|
|
#### 3. Enhanced `_wrap_old_logs()` Method
|
|
|
|
**New Parameters**:
|
|
- `time_based`: Boolean flag to prioritize time-based retention
|
|
|
|
**Aggregation Strategy**:
|
|
1. **Time-based mode**: Aggregates ALL logs older than 90 days (excluding already aggregated)
|
|
2. **Count-based mode**: Aggregates oldest logs beyond 4,000 limit
|
|
3. **Combined mode**: When count-based is primary, also includes old logs to prevent keeping very old logs just because they're within count limit
|
|
|
|
**Key Improvements**:
|
|
- Prevents re-aggregation of already aggregated logs (`endpoint != '[AGGREGATED]'`)
|
|
- Prioritizes old logs even in count-based mode
|
|
- Better logging for debugging and monitoring
|
|
|
|
## How It Works
|
|
|
|
### Automatic Triggering
|
|
|
|
The log wrapping is automatically triggered on every `/usage-logs` API call:
|
|
|
|
```python
|
|
# In backend/api/subscription/routes/logs.py
|
|
wrapping_service = LogWrappingService(db)
|
|
wrap_result = wrapping_service.check_and_wrap_logs(user_id)
|
|
```
|
|
|
|
### Retention Logic Flow
|
|
|
|
```
|
|
1. Check total log count
|
|
├─ If > 5,000 → Count-based trigger
|
|
└─ If ≤ 5,000 → Continue
|
|
|
|
2. Check for old logs (> 90 days)
|
|
├─ If found → Time-based trigger
|
|
└─ If none → No action needed
|
|
|
|
3. If either trigger active:
|
|
├─ Time-based: Aggregate ALL logs older than 90 days
|
|
├─ Count-based: Aggregate oldest logs beyond 4,000 limit
|
|
└─ Combined: Merge both sets (prioritize old logs)
|
|
|
|
4. Create aggregated records
|
|
├─ Group by provider + billing period
|
|
├─ Preserve: costs, tokens, counts, success rates
|
|
└─ Delete individual logs that were aggregated
|
|
```
|
|
|
|
### Example Scenarios
|
|
|
|
**Scenario 1: Time-Based Only**
|
|
- User has 3,000 logs
|
|
- 500 logs are older than 90 days
|
|
- **Result**: 500 old logs aggregated, 2,500 detailed logs kept
|
|
|
|
**Scenario 2: Count-Based Only**
|
|
- User has 6,000 logs (all recent)
|
|
- **Result**: 2,000 oldest logs aggregated, 4,000 detailed logs kept
|
|
|
|
**Scenario 3: Both Triggers**
|
|
- User has 6,000 logs
|
|
- 1,000 logs are older than 90 days
|
|
- **Result**: All 1,000 old logs + 1,000 additional oldest logs aggregated, 4,000 detailed logs kept
|
|
|
|
## Configuration
|
|
|
|
### Retention Period
|
|
|
|
Currently set to **90 days**. To change:
|
|
|
|
```python
|
|
# In LogWrappingService class
|
|
RETENTION_DAYS = 90 # Change this value
|
|
```
|
|
|
|
**Recommended Values**:
|
|
- **90 days** (current): Good balance for most use cases
|
|
- **60 days**: More aggressive, faster aggregation
|
|
- **180 days**: Less aggressive, keeps more detailed history
|
|
|
|
### Count Limits
|
|
|
|
```python
|
|
MAX_LOGS_PER_USER = 5000 # Total logs per user
|
|
logs_to_keep = 4000 # Detailed logs to keep
|
|
```
|
|
|
|
## Response Format
|
|
|
|
The `check_and_wrap_logs()` method now returns enhanced information:
|
|
|
|
```python
|
|
{
|
|
'wrapped': True,
|
|
'total_logs_before': 6000,
|
|
'total_logs_after': 4500,
|
|
'aggregated_logs': 1500,
|
|
'aggregated_periods': [...],
|
|
'trigger_reasons': [
|
|
'count limit (6000 > 5000)',
|
|
'time-based retention (500 logs older than 90 days)'
|
|
],
|
|
'old_logs_aggregated': 500,
|
|
'message': 'Wrapped 1500 logs into 12 aggregated records'
|
|
}
|
|
```
|
|
|
|
## Benefits
|
|
|
|
1. **Automatic Cleanup**: Old logs are automatically aggregated without manual intervention
|
|
2. **Storage Efficiency**: Prevents indefinite growth of detailed logs
|
|
3. **Context Preservation**: Aggregated logs maintain all important metrics
|
|
4. **Dual Protection**: Both count and time limits ensure efficient storage
|
|
5. **No Data Loss**: Historical data is preserved in aggregated form
|
|
|
|
## Testing
|
|
|
|
### Manual Testing
|
|
|
|
1. **Create old logs** (for testing, you can manually update timestamps in database):
|
|
```sql
|
|
UPDATE api_usage_logs
|
|
SET timestamp = datetime('now', '-100 days')
|
|
WHERE user_id = 'test_user' AND id IN (SELECT id FROM api_usage_logs LIMIT 10);
|
|
```
|
|
|
|
2. **Trigger wrapping** by calling `/api/subscription/usage-logs`
|
|
|
|
3. **Verify**:
|
|
- Old logs are aggregated
|
|
- Aggregated logs have `endpoint = '[AGGREGATED]'`
|
|
- Total log count reduced
|
|
- Costs and tokens preserved in aggregated records
|
|
|
|
### Expected Behavior
|
|
|
|
- Logs older than 90 days are automatically aggregated
|
|
- Aggregated logs are not re-aggregated
|
|
- Most recent 4,000 logs remain detailed
|
|
- All historical data is preserved in aggregated form
|
|
|
|
## Monitoring
|
|
|
|
The service logs detailed information:
|
|
|
|
```
|
|
[LogWrapping] User {user_id} needs log wrapping. Total: 6000, Old logs: 500. Triggers: count limit, time-based retention
|
|
[LogWrapping] Time-based aggregation: Found 500 logs older than 90 days
|
|
[LogWrapping] Wrapped 1500 logs into 12 aggregated records. Remaining logs: 4500
|
|
```
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Configurable Retention**: Make `RETENTION_DAYS` configurable via environment variable
|
|
2. **Tiered Retention**: Different retention periods for different log types
|
|
3. **Archive Tables**: Move very old aggregated logs to separate archive tables
|
|
4. **Scheduled Jobs**: Run aggregation on a schedule instead of on-demand
|
|
5. **Metrics**: Track aggregation statistics over time
|
|
|
|
## Backward Compatibility
|
|
|
|
✅ **Fully backward compatible**:
|
|
- Existing count-based logic still works
|
|
- No breaking changes to API responses
|
|
- Old logs without `actual_provider_name` are handled correctly
|
|
- Aggregated logs are properly identified and displayed
|
|
|
|
## Related Files
|
|
|
|
- `backend/services/subscription/log_wrapping_service.py` - Main implementation
|
|
- `backend/api/subscription/routes/logs.py` - API endpoint that triggers wrapping
|
|
- `frontend/src/components/billing/UsageLogsTable.tsx` - Frontend display
|
|
- `docs/Billing_Subscription/LOG_STORAGE_AND_RETENTION_REVIEW.md` - Review document
|