AI Analysis and Content Strategy fixes. Enhanced Strategy Routes refactoring.
This commit is contained in:
280
docs/Billing_Subscription/LOG_STORAGE_AND_RETENTION_REVIEW.md
Normal file
280
docs/Billing_Subscription/LOG_STORAGE_AND_RETENTION_REVIEW.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# Log Storage and Retention Review
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document reviews the storage limits, retention policies, and log management mechanisms for:
|
||||
1. **API Usage Logs** (`api_usage_logs` table)
|
||||
2. **Subscription Renewal History** (`subscription_renewal_history` table)
|
||||
|
||||
## 1. API Usage Logs
|
||||
|
||||
### Current Storage Limits
|
||||
|
||||
**Per-User Limit:**
|
||||
- **Maximum Logs Per User**: `5,000` logs (defined in `LogWrappingService.MAX_LOGS_PER_USER`)
|
||||
- **Detailed Logs Kept**: `4,000` most recent logs
|
||||
- **Aggregation Threshold**: Logs older than 30 days OR beyond the 4,000 limit are aggregated
|
||||
|
||||
**API Query Limits:**
|
||||
- **Frontend Default**: 50 logs per page (configurable: 10, 25, 50, 100)
|
||||
- **Backend Maximum**: 5,000 logs per query (`limit` parameter: `ge=1, le=5000`)
|
||||
- **Pagination**: Fully supported with `offset` and `limit` parameters
|
||||
|
||||
### Log Wrapping/Aggregation Mechanism
|
||||
|
||||
**Service**: `LogWrappingService` (`backend/services/subscription/log_wrapping_service.py`)
|
||||
|
||||
**How It Works:**
|
||||
1. **Automatic Check**: Triggered on every `/usage-logs` API call via `check_and_wrap_logs()`
|
||||
2. **Threshold Detection**: When user exceeds 5,000 logs
|
||||
3. **Aggregation Strategy**:
|
||||
- Keeps most recent 4,000 logs as detailed records
|
||||
- Aggregates oldest logs beyond 4,000 limit
|
||||
- Groups by provider and billing period
|
||||
- Creates aggregated log entries with:
|
||||
- Total counts, tokens, costs
|
||||
- Average response time
|
||||
- Success/failure counts
|
||||
- Time range (oldest to newest timestamp)
|
||||
- Deletes individual logs that were aggregated
|
||||
|
||||
**Aggregated Log Format:**
|
||||
- `endpoint`: `"[AGGREGATED]"`
|
||||
- `method`: `"AGGREGATED"`
|
||||
- `model_used`: `"[{count} calls aggregated]"`
|
||||
- `error_message`: Contains summary (e.g., "Aggregated 150 calls: 145 success, 5 failed")
|
||||
- `is_aggregated`: Flag set to `true` in frontend
|
||||
|
||||
**Context Preservation:**
|
||||
- ✅ **Preserved**: Total costs, tokens, call counts, success/failure rates, time ranges
|
||||
- ✅ **Preserved**: Provider and billing period grouping
|
||||
- ✅ **Preserved**: Average response time
|
||||
- ❌ **Lost**: Individual endpoint details, specific error messages, request/response sizes
|
||||
|
||||
### Current Implementation Status
|
||||
|
||||
**✅ Implemented:**
|
||||
- Automatic log wrapping when limit exceeded
|
||||
- Aggregation by provider and billing period
|
||||
- Context preservation for aggregated data
|
||||
- Frontend display of aggregated logs with special formatting
|
||||
|
||||
**⚠️ Potential Issues:**
|
||||
1. **No Time-Based Retention**: Only count-based, not age-based cleanup
|
||||
2. **No Manual Cleanup Script**: No scheduled job to clean very old logs
|
||||
3. **Database Growth**: Aggregated logs still count toward the 5,000 limit
|
||||
4. **No Archive Strategy**: No mechanism to move old logs to archive tables
|
||||
|
||||
### Recommendations
|
||||
|
||||
1. **Add Time-Based Retention**:
|
||||
- Archive logs older than 12 months
|
||||
- Keep aggregated logs for 24 months
|
||||
- Delete logs older than 24 months
|
||||
|
||||
2. **Improve Aggregation Strategy**:
|
||||
- Consider aggregating by month for logs older than 90 days
|
||||
- Create separate archive table for very old logs
|
||||
- Implement tiered storage (hot/warm/cold)
|
||||
|
||||
3. **Add Cleanup Script**:
|
||||
- Scheduled job to run monthly
|
||||
- Archive old logs before deletion
|
||||
- Maintain audit trail
|
||||
|
||||
## 2. Subscription Renewal History
|
||||
|
||||
### Current Storage Limits
|
||||
|
||||
**Per-User Limit:**
|
||||
- **No Hard Limit**: Unlimited storage (no cleanup/aggregation)
|
||||
- **API Query Limit**: Maximum 100 records per query (`limit` parameter: `ge=1, le=100`)
|
||||
- **Frontend Default**: 20 records per page (configurable: 10, 20, 50, 100)
|
||||
|
||||
**Storage Characteristics:**
|
||||
- One record per renewal/upgrade/downgrade event
|
||||
- Includes usage snapshot before renewal (`usage_before_renewal` JSON field)
|
||||
- Includes payment information
|
||||
- Includes period information (start/end dates)
|
||||
|
||||
### Current Implementation Status
|
||||
|
||||
**✅ Implemented:**
|
||||
- Full history tracking for all subscription events
|
||||
- Usage snapshots preserved in JSON format
|
||||
- Pagination support
|
||||
- No automatic cleanup (preserves all history)
|
||||
|
||||
**⚠️ Potential Issues:**
|
||||
1. **Unlimited Growth**: No retention policy - will grow indefinitely
|
||||
2. **Large JSON Snapshots**: `usage_before_renewal` can be large for active users
|
||||
3. **No Archive Strategy**: All records kept in primary table
|
||||
4. **No Cleanup Script**: No mechanism to archive old records
|
||||
|
||||
### Recommendations
|
||||
|
||||
1. **Add Retention Policy**:
|
||||
- Keep detailed records for last 24 months
|
||||
- Archive records older than 24 months
|
||||
- Keep summary records (without full usage snapshots) for 7 years (tax/audit)
|
||||
|
||||
2. **Optimize Storage**:
|
||||
- Compress `usage_before_renewal` JSON for old records
|
||||
- Create summary table for very old records
|
||||
- Remove detailed usage snapshots after 12 months
|
||||
|
||||
3. **Add Cleanup Script**:
|
||||
- Monthly job to archive records older than 24 months
|
||||
- Maintain summary records for compliance
|
||||
- Preserve payment information indefinitely
|
||||
|
||||
## 3. Log Replay Mechanism
|
||||
|
||||
### Current Status
|
||||
|
||||
**❌ No Log Replay**: There is no mechanism to replay or reconstruct usage from logs.
|
||||
|
||||
**What Would Be Needed:**
|
||||
1. **Event Sourcing Pattern**: Store events that can be replayed
|
||||
2. **Replay Service**: Service to process logs and rebuild state
|
||||
3. **State Reconstruction**: Ability to rebuild `UsageSummary` from `APIUsageLog` entries
|
||||
|
||||
### Current Data Flow
|
||||
|
||||
```
|
||||
API Call → monitoring_middleware → UsageTrackingService.track_api_usage()
|
||||
↓
|
||||
APIUsageLog (individual record)
|
||||
↓
|
||||
UsageSummary (aggregated by billing period)
|
||||
```
|
||||
|
||||
**Issue**: If `UsageSummary` is corrupted or lost, it cannot be fully reconstructed from `APIUsageLog` because:
|
||||
- Aggregation happens in real-time
|
||||
- No event sourcing pattern
|
||||
- No replay mechanism
|
||||
|
||||
### Recommendations
|
||||
|
||||
1. **Add Replay Capability**:
|
||||
- Create `replay_usage_logs()` function in `UsageTrackingService`
|
||||
- Rebuild `UsageSummary` from `APIUsageLog` entries
|
||||
- Support replay for specific billing periods
|
||||
|
||||
2. **Add Validation**:
|
||||
- Periodic job to validate `UsageSummary` against `APIUsageLog`
|
||||
- Detect discrepancies and auto-correct
|
||||
- Alert on data inconsistencies
|
||||
|
||||
3. **Consider Event Sourcing** (Future):
|
||||
- Store events instead of just logs
|
||||
- Enable full state reconstruction
|
||||
- Support time-travel queries
|
||||
|
||||
## 4. Summary and Action Items
|
||||
|
||||
### Current State
|
||||
|
||||
| Metric | API Usage Logs | Renewal History |
|
||||
|--------|---------------|----------------|
|
||||
| **Per-User Limit** | 5,000 logs | Unlimited |
|
||||
| **Aggregation** | ✅ Yes (automatic) | ❌ No |
|
||||
| **Retention Policy** | ⚠️ Count-based only | ❌ None |
|
||||
| **Cleanup Script** | ❌ No | ❌ No |
|
||||
| **Log Replay** | ❌ No | ❌ No |
|
||||
| **Archive Strategy** | ❌ No | ❌ No |
|
||||
|
||||
### Priority Actions
|
||||
|
||||
**High Priority:**
|
||||
1. ✅ **Log Wrapping Works**: Already implemented and functional
|
||||
2. ⚠️ **Add Time-Based Retention**: Implement age-based cleanup for API logs
|
||||
3. ⚠️ **Add Renewal History Retention**: Implement retention policy for renewal history
|
||||
|
||||
**Medium Priority:**
|
||||
4. **Add Cleanup Scripts**: Create scheduled jobs for both tables
|
||||
5. **Add Archive Tables**: Create archive tables for old data
|
||||
6. **Add Replay Capability**: Enable reconstruction of UsageSummary from logs
|
||||
|
||||
**Low Priority:**
|
||||
7. **Optimize Storage**: Compress JSON fields, optimize indexes
|
||||
8. **Add Monitoring**: Alert on storage growth, aggregation events
|
||||
9. **Documentation**: Document retention policies for users
|
||||
|
||||
### Code Locations
|
||||
|
||||
**Log Wrapping:**
|
||||
- `backend/services/subscription/log_wrapping_service.py`
|
||||
- Triggered in: `backend/api/subscription/routes/logs.py` (line 86-89)
|
||||
|
||||
**Usage Logs API:**
|
||||
- `backend/api/subscription/routes/logs.py`
|
||||
- Frontend: `frontend/src/components/billing/UsageLogsTable.tsx`
|
||||
|
||||
**Renewal History API:**
|
||||
- `backend/api/subscription/routes/subscriptions.py` (line 519-586)
|
||||
- Frontend: `frontend/src/components/billing/SubscriptionRenewalHistory.tsx`
|
||||
|
||||
**Models:**
|
||||
- `backend/models/subscription_models.py`
|
||||
- `APIUsageLog` (line 127-173)
|
||||
- `SubscriptionRenewalHistory` (line 341-389)
|
||||
|
||||
## 5. Recommended Retention Policies
|
||||
|
||||
### API Usage Logs
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Retention Policy: API Usage Logs │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 0-30 days: Detailed logs (all fields) │
|
||||
│ 30-90 days: Detailed logs (keep 4,000 most recent) │
|
||||
│ 90-365 days: Aggregated by month │
|
||||
│ 365-730 days: Aggregated by quarter │
|
||||
│ 730+ days: Archive to separate table │
|
||||
│ │
|
||||
│ Max per user: 5,000 records (detailed + aggregated) │
|
||||
│ Archive table: Unlimited (for compliance/audit) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Subscription Renewal History
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Retention Policy: Renewal History │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 0-12 months: Full records with usage snapshots │
|
||||
│ 12-24 months: Full records (compressed snapshots) │
|
||||
│ 24-84 months: Summary records (no usage snapshots) │
|
||||
│ 84+ months: Archive to separate table │
|
||||
│ │
|
||||
│ Payment data: Keep indefinitely (tax/audit compliance) │
|
||||
│ Usage snapshots: Remove after 12 months │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 6. Implementation Plan
|
||||
|
||||
### Phase 1: Immediate (No Breaking Changes)
|
||||
1. Document current behavior
|
||||
2. Add monitoring/alerts for log counts
|
||||
3. Add database indexes for performance
|
||||
|
||||
### Phase 2: Retention Policies (Backward Compatible)
|
||||
1. Add time-based retention to log wrapping
|
||||
2. Create archive tables
|
||||
3. Add cleanup scripts (manual execution)
|
||||
|
||||
### Phase 3: Automation
|
||||
1. Schedule cleanup jobs (cron/scheduler)
|
||||
2. Add replay capability
|
||||
3. Add validation/audit jobs
|
||||
|
||||
### Phase 4: Optimization
|
||||
1. Compress JSON fields
|
||||
2. Optimize queries with better indexes
|
||||
3. Add caching for frequently accessed data
|
||||
Reference in New Issue
Block a user