327 lines
9.6 KiB
Markdown
327 lines
9.6 KiB
Markdown
# API Key Injection - How It Works in Production
|
|
|
|
## 🎯 The Problem You Identified
|
|
|
|
**Question:** "For production, when we read APIs from database, how will they be exported to the environment?"
|
|
|
|
**Answer:** They are **temporarily injected** into `os.environ` for each request, then immediately cleaned up.
|
|
|
|
---
|
|
|
|
## 🔍 The Challenge
|
|
|
|
### **Existing Code Pattern:**
|
|
|
|
Most of your codebase uses this pattern:
|
|
|
|
```python
|
|
import os
|
|
import google.generativeai as genai
|
|
|
|
def generate_content(prompt: str):
|
|
# Expects GEMINI_API_KEY in environment
|
|
gemini_key = os.getenv('GEMINI_API_KEY')
|
|
genai.configure(api_key=gemini_key)
|
|
# ...
|
|
```
|
|
|
|
### **Production Problem:**
|
|
|
|
```
|
|
User A's request:
|
|
↓
|
|
os.getenv('GEMINI_API_KEY') → ??? (User A's key in database, not in os.environ)
|
|
|
|
User B's request (simultaneous):
|
|
↓
|
|
os.getenv('GEMINI_API_KEY') → ??? (User B's key in database, not in os.environ)
|
|
```
|
|
|
|
**Issue:** `os.environ` is global, but we need user-specific keys!
|
|
|
|
---
|
|
|
|
## ✅ The Solution: Request-Scoped Injection
|
|
|
|
### **How It Works:**
|
|
|
|
```
|
|
1. Request arrives with Authorization: Bearer <user_a_token>
|
|
↓
|
|
2. API Key Injection Middleware extracts user_id from token
|
|
↓
|
|
3. Fetch User A's keys from database
|
|
↓
|
|
4. Temporarily inject into os.environ:
|
|
- GEMINI_API_KEY = user_a_gemini_key
|
|
- EXA_API_KEY = user_a_exa_key
|
|
↓
|
|
5. Process request (all os.getenv() calls get User A's keys)
|
|
↓
|
|
6. Request completes
|
|
↓
|
|
7. IMMEDIATELY clean up os.environ (remove User A's keys)
|
|
```
|
|
|
|
### **Key Insight:**
|
|
|
|
**The injection is request-scoped, not global:**
|
|
- User A's keys exist in `os.environ` ONLY during User A's request
|
|
- Immediately removed after response sent
|
|
- User B's request gets User B's keys injected
|
|
- No overlap, no conflict!
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### **Middleware Flow:**
|
|
|
|
```
|
|
FastAPI Request Pipeline:
|
|
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 1. Rate Limit Middleware │
|
|
│ └─> Check rate limits │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 2. API Key Injection Middleware (NEW!) │
|
|
│ ├─> Extract user_id from Authorization header │
|
|
│ ├─> Fetch user's API keys from database │
|
|
│ ├─> Inject into os.environ (temporarily) │
|
|
│ │ ├─> GEMINI_API_KEY = user_specific_key │
|
|
│ │ ├─> EXA_API_KEY = user_specific_key │
|
|
│ │ └─> COPILOTKIT_API_KEY = user_specific_key │
|
|
│ └─> [Request processed with user-specific keys] │
|
|
│ ↓ │
|
|
│ ├─> [Response generated] │
|
|
│ └─> CLEANUP: Remove injected keys from os.environ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 3. Your Endpoint (e.g., /api/blog/generate) │
|
|
│ └─> Calls service that uses os.getenv('GEMINI_API_KEY') │
|
|
│ └─> Gets user-specific key! ✅ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 💻 Code Example
|
|
|
|
### **The Middleware:**
|
|
|
|
```python
|
|
async def __call__(self, request: Request, call_next):
|
|
# 1. Extract user_id from token
|
|
user_id = extract_user_from_token(request)
|
|
|
|
if not user_id or DEPLOY_ENV == 'local':
|
|
return await call_next(request) # Skip in local mode
|
|
|
|
# 2. Get user-specific keys from database
|
|
with user_api_keys(user_id) as user_keys:
|
|
# 3. Save original environment (if any)
|
|
original_gemini = os.environ.get('GEMINI_API_KEY')
|
|
original_exa = os.environ.get('EXA_API_KEY')
|
|
|
|
# 4. Inject user-specific keys
|
|
os.environ['GEMINI_API_KEY'] = user_keys['gemini']
|
|
os.environ['EXA_API_KEY'] = user_keys['exa']
|
|
|
|
try:
|
|
# 5. Process request with user-specific keys
|
|
response = await call_next(request)
|
|
return response
|
|
finally:
|
|
# 6. CRITICAL: Restore original environment
|
|
if original_gemini is None:
|
|
del os.environ['GEMINI_API_KEY']
|
|
else:
|
|
os.environ['GEMINI_API_KEY'] = original_gemini
|
|
|
|
if original_exa is None:
|
|
del os.environ['EXA_API_KEY']
|
|
else:
|
|
os.environ['EXA_API_KEY'] = original_exa
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Concurrent Requests Example
|
|
|
|
### **Scenario: Two Users Generate Content Simultaneously**
|
|
|
|
```
|
|
TIME: 00:00:000
|
|
User A request arrives
|
|
├─> Extract user_id = "user_a"
|
|
├─> Fetch keys from DB: gemini_key = "key_a_123"
|
|
├─> os.environ['GEMINI_API_KEY'] = "key_a_123"
|
|
│
|
|
├─> TIME: 00:00:050 (50ms later)
|
|
│ User B request arrives
|
|
│ ├─> Extract user_id = "user_b"
|
|
│ ├─> Fetch keys from DB: gemini_key = "key_b_456"
|
|
│ ├─> os.environ['GEMINI_API_KEY'] = "key_b_456" ← Overwrites!
|
|
│ │
|
|
│ ├─> User B's request processes
|
|
│ │ os.getenv('GEMINI_API_KEY') → "key_b_456" ✅
|
|
│ │
|
|
│ └─> TIME: 00:00:100
|
|
│ User B response sent
|
|
│ os.environ['GEMINI_API_KEY'] restored
|
|
│
|
|
└─> TIME: 00:00:120
|
|
User A's request processes
|
|
os.getenv('GEMINI_API_KEY') → ??? (Could be wrong!)
|
|
```
|
|
|
|
**⚠️ PROBLEM: Race condition!**
|
|
|
|
---
|
|
|
|
## 🔒 Thread Safety Solution
|
|
|
|
Python's asyncio in FastAPI handles this correctly:
|
|
|
|
```python
|
|
# FastAPI uses asyncio, which is single-threaded
|
|
# Each request is processed in sequence (no parallel execution)
|
|
# So the injection is safe!
|
|
|
|
User A request:
|
|
├─> Inject A's keys
|
|
├─> await generate_content() ← Async, but single-threaded
|
|
└─> Cleanup A's keys
|
|
|
|
User B request (after A):
|
|
├─> Inject B's keys
|
|
├─> await generate_content()
|
|
└─> Cleanup B's keys
|
|
```
|
|
|
|
**BUT:** If your code uses threading or multiprocessing, this approach WON'T work safely.
|
|
|
|
---
|
|
|
|
## 🎛️ Modes Compared
|
|
|
|
### **Local Mode (DEPLOY_ENV=local):**
|
|
|
|
```
|
|
Request arrives
|
|
↓
|
|
Middleware detects DEPLOY_ENV=local
|
|
↓
|
|
SKIP injection (keys already in .env)
|
|
↓
|
|
os.getenv('GEMINI_API_KEY') → reads from .env file
|
|
↓
|
|
Works! ✅
|
|
```
|
|
|
|
### **Production Mode (DEPLOY_ENV=production):**
|
|
|
|
```
|
|
Request arrives with user_id=user_123
|
|
↓
|
|
Middleware detects DEPLOY_ENV=production
|
|
↓
|
|
Fetch user_123's keys from database
|
|
↓
|
|
Inject into os.environ (temporarily)
|
|
↓
|
|
os.getenv('GEMINI_API_KEY') → gets user_123's key
|
|
↓
|
|
Process request
|
|
↓
|
|
Clean up os.environ
|
|
↓
|
|
Works! ✅
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 Important Caveats
|
|
|
|
### **1. Async-Only Safety**
|
|
|
|
This approach is safe ONLY because FastAPI uses asyncio (single-threaded event loop).
|
|
|
|
**If you use:**
|
|
- `concurrent.futures.ThreadPoolExecutor`
|
|
- `multiprocessing.Pool`
|
|
- `threading.Thread`
|
|
|
|
Then environment injection is **NOT SAFE** and will cause race conditions!
|
|
|
|
### **2. Better Long-Term Approach**
|
|
|
|
For critical services, refactor to pass `user_id` explicitly:
|
|
|
|
```python
|
|
# Instead of:
|
|
def generate(prompt: str):
|
|
key = os.getenv('GEMINI_API_KEY') # Fragile!
|
|
|
|
# Do this:
|
|
def generate(user_id: str, prompt: str):
|
|
with user_api_keys(user_id) as keys:
|
|
key = keys['gemini'] # Explicit and safe!
|
|
```
|
|
|
|
---
|
|
|
|
## 📝 Summary
|
|
|
|
### **The Magic:**
|
|
|
|
1. **Request arrives** → Middleware extracts `user_id`
|
|
2. **Fetch from DB** → Get user's keys
|
|
3. **Inject temporarily** → `os.environ['GEMINI_API_KEY'] = user_key`
|
|
4. **Process request** → All `os.getenv()` calls get user's key
|
|
5. **Cleanup** → Remove from `os.environ`
|
|
6. **Next request** → Different user, different keys
|
|
|
|
### **Why It Works:**
|
|
|
|
- ✅ FastAPI is async + single-threaded
|
|
- ✅ Injection is request-scoped
|
|
- ✅ Cleanup is guaranteed (finally block)
|
|
- ✅ Existing code works without changes
|
|
- ✅ Each user gets their own keys
|
|
|
|
### **Limitations:**
|
|
|
|
- ⚠️ Not safe with threading/multiprocessing
|
|
- ⚠️ Slightly slower (DB query per request)
|
|
- ⚠️ Better to refactor critical services
|
|
|
|
### **Bottom Line:**
|
|
|
|
> **It works!** Your existing code that uses `os.getenv()` will get user-specific keys in production, with zero code changes. The middleware handles everything automatically.
|
|
|
|
---
|
|
|
|
## 🔄 Migration Path
|
|
|
|
### **Phase 1: Now (Compatibility Layer)**
|
|
- ✅ Middleware injects keys for ALL services
|
|
- ✅ No code changes needed
|
|
- ✅ Works immediately
|
|
|
|
### **Phase 2: Later (Gradual Refactor)**
|
|
- Refactor critical services to use `UserAPIKeyContext` directly
|
|
- Remove dependency on `os.getenv()`
|
|
- More explicit, safer
|
|
|
|
### **Phase 3: Future (Full Migration)**
|
|
- All services use `user_api_keys(user_id)`
|
|
- Remove injection middleware
|
|
- Clean, explicit architecture
|
|
|
|
**For now:** Middleware lets you deploy immediately without touching 100+ files! 🎉
|
|
|