Files
ALwrity/backend/docs/STABILITY_AI_INTEGRATION.md
2025-09-23 12:38:27 +05:30

672 lines
19 KiB
Markdown

# Stability AI Integration Documentation
This document provides comprehensive documentation for the Stability AI integration in the ALwrity backend.
## Overview
The Stability AI integration provides access to all major Stability AI services including:
- **Image Generation**: Ultra, Core, and SD3.5 models
- **Image Editing**: Erase, Inpaint, Outpaint, Search & Replace, Search & Recolor, Background Removal
- **Image Upscaling**: Fast, Conservative, and Creative upscaling
- **Image Control**: Sketch, Structure, Style, and Style Transfer control
- **3D Generation**: Fast 3D and Point-Aware 3D model generation
- **Audio Generation**: Text-to-Audio, Audio-to-Audio, and Audio Inpainting
- **Legacy V1 APIs**: SDXL 1.0 and other V1 engines
## Architecture
### Modular Structure
```
backend/
├── models/
│ └── stability_models.py # Pydantic models for all API schemas
├── services/
│ └── stability_service.py # Core service class with HTTP client
├── routers/
│ ├── stability.py # Main API endpoints
│ ├── stability_advanced.py # Advanced workflows and features
│ └── stability_admin.py # Admin and monitoring endpoints
├── middleware/
│ └── stability_middleware.py # Rate limiting, caching, monitoring
├── utils/
│ └── stability_utils.py # Utility functions and validators
├── config/
│ └── stability_config.py # Configuration and constants
└── test/
└── test_stability_endpoints.py # Comprehensive test suite
```
### Key Components
1. **StabilityAIService**: Core service class handling all API interactions
2. **Pydantic Models**: Comprehensive request/response models with validation
3. **FastAPI Routers**: Organized endpoints for different service categories
4. **Middleware**: Rate limiting, caching, monitoring, and content moderation
5. **Utilities**: File handling, validation, optimization, and workflow management
## API Endpoints
### Generation Endpoints
#### POST `/api/stability/generate/ultra`
Generate high-quality images using Stable Image Ultra.
**Parameters:**
- `prompt` (required): Text description of desired image
- `image` (optional): Input image for image-to-image generation
- `negative_prompt` (optional): What you don't want to see
- `aspect_ratio` (optional): Image aspect ratio (default: "1:1")
- `seed` (optional): Random seed (0-4294967294)
- `output_format` (optional): Output format (jpeg, png, webp)
- `style_preset` (optional): Style preset
- `strength` (optional): Image influence strength (required if image provided)
**Response:** Image bytes or JSON with generation ID
**Cost:** 8 credits per generation
#### POST `/api/stability/generate/core`
Fast and affordable image generation.
**Parameters:**
- `prompt` (required): Text description
- `negative_prompt` (optional): Negative prompt
- `aspect_ratio` (optional): Image aspect ratio
- `seed` (optional): Random seed
- `output_format` (optional): Output format
- `style_preset` (optional): Style preset
**Cost:** 3 credits per generation
#### POST `/api/stability/generate/sd3`
Generate using Stable Diffusion 3.5 models.
**Parameters:**
- `prompt` (required): Text description
- `mode` (optional): "text-to-image" or "image-to-image"
- `image` (optional): Input image (required for image-to-image)
- `strength` (optional): Image influence (required for image-to-image)
- `aspect_ratio` (optional): Image aspect ratio (text-to-image only)
- `model` (optional): SD3 model variant
- `cfg_scale` (optional): CFG scale (1-10)
**Cost:** 2.5-6.5 credits depending on model
### Edit Endpoints
#### POST `/api/stability/edit/erase`
Remove unwanted objects using masks.
**Parameters:**
- `image` (required): Image file to edit
- `mask` (optional): Mask image (or use alpha channel)
- `grow_mask` (optional): Mask edge growth (0-20 pixels)
- `seed` (optional): Random seed
- `output_format` (optional): Output format
**Cost:** 5 credits per generation
#### POST `/api/stability/edit/inpaint`
Fill or replace specified areas with new content.
**Parameters:**
- `image` (required): Image file to edit
- `prompt` (required): Description of desired content
- `mask` (optional): Mask image
- `negative_prompt` (optional): Negative prompt
- `grow_mask` (optional): Mask edge growth (0-100 pixels)
- `style_preset` (optional): Style preset
**Cost:** 5 credits per generation
#### POST `/api/stability/edit/outpaint`
Expand image in specified directions.
**Parameters:**
- `image` (required): Image file to expand
- `left` (optional): Pixels to expand left (0-2000)
- `right` (optional): Pixels to expand right (0-2000)
- `up` (optional): Pixels to expand up (0-2000)
- `down` (optional): Pixels to expand down (0-2000)
- `creativity` (optional): Creativity level (0-1)
- `prompt` (optional): Guidance prompt
**Note:** At least one direction must be specified.
**Cost:** 4 credits per generation
#### POST `/api/stability/edit/search-and-replace`
Replace objects using text prompts instead of masks.
**Parameters:**
- `image` (required): Image file to edit
- `prompt` (required): Description of replacement
- `search_prompt` (required): What to search for
- `grow_mask` (optional): Mask edge growth (0-20 pixels)
**Cost:** 5 credits per generation
#### POST `/api/stability/edit/search-and-recolor`
Change colors of specific objects using prompts.
**Parameters:**
- `image` (required): Image file to edit
- `prompt` (required): Description of new colors
- `select_prompt` (required): What to select for recoloring
**Cost:** 5 credits per generation
#### POST `/api/stability/edit/remove-background`
Remove background from images.
**Parameters:**
- `image` (required): Image file
- `output_format` (optional): Output format (png, webp)
**Cost:** 5 credits per generation
### Upscale Endpoints
#### POST `/api/stability/upscale/fast`
Fast 4x upscaling (~1 second processing).
**Parameters:**
- `image` (required): Image file to upscale
- `output_format` (optional): Output format
**Cost:** 2 credits per generation
#### POST `/api/stability/upscale/conservative`
Conservative upscaling to 4K with minimal changes.
**Parameters:**
- `image` (required): Image file to upscale
- `prompt` (required): Description for guidance
- `creativity` (optional): Creativity level (0.2-0.5)
**Cost:** 40 credits per generation
#### POST `/api/stability/upscale/creative`
Creative upscaling for highly degraded images (async).
**Parameters:**
- `image` (required): Image file to upscale
- `prompt` (required): Description for guidance
- `creativity` (optional): Creativity level (0.1-0.5)
- `style_preset` (optional): Style preset
**Cost:** 60 credits per generation
### Control Endpoints
#### POST `/api/stability/control/sketch`
Generate refined images from sketches.
**Parameters:**
- `image` (required): Sketch or line art
- `prompt` (required): Description of desired result
- `control_strength` (optional): Control strength (0-1)
**Cost:** 5 credits per generation
#### POST `/api/stability/control/structure`
Maintain structure while changing content.
**Parameters:**
- `image` (required): Structure reference image
- `prompt` (required): Description of desired result
- `control_strength` (optional): Control strength (0-1)
**Cost:** 5 credits per generation
#### POST `/api/stability/control/style`
Extract and apply style from reference image.
**Parameters:**
- `image` (required): Style reference image
- `prompt` (required): Description of desired result
- `aspect_ratio` (optional): Output aspect ratio
- `fidelity` (optional): Style fidelity (0-1)
**Cost:** 5 credits per generation
#### POST `/api/stability/control/style-transfer`
Transfer style between two images.
**Parameters:**
- `init_image` (required): Image to restyle
- `style_image` (required): Style reference
- `style_strength` (optional): Style strength (0-1)
- `composition_fidelity` (optional): Composition preservation (0-1)
**Cost:** 8 credits per generation
### 3D Endpoints
#### POST `/api/stability/3d/stable-fast-3d`
Generate 3D models from 2D images (fast).
**Parameters:**
- `image` (required): 2D image to convert
- `texture_resolution` (optional): Texture resolution (512, 1024, 2048)
- `foreground_ratio` (optional): Object size ratio (0.1-1)
- `remesh` (optional): Remesh algorithm (none, triangle, quad)
**Output:** GLB 3D model file
**Cost:** 10 credits per generation
#### POST `/api/stability/3d/stable-point-aware-3d`
Advanced 3D generation with editing capabilities.
**Parameters:**
- `image` (required): 2D image to convert
- `texture_resolution` (optional): Texture resolution
- `foreground_ratio` (optional): Object size ratio (1-2)
- `target_type` (optional): Simplification target (none, vertex, face)
- `guidance_scale` (optional): Guidance scale (1-10)
**Cost:** 4 credits per generation
### Audio Endpoints
#### POST `/api/stability/audio/text-to-audio`
Generate audio from text descriptions.
**Parameters:**
- `prompt` (required): Audio description
- `duration` (optional): Duration in seconds (1-190)
- `model` (optional): Audio model (stable-audio-2, stable-audio-2.5)
- `steps` (optional): Sampling steps (model-dependent)
- `cfg_scale` (optional): CFG scale (1-25)
**Cost:** 20 credits per generation
#### POST `/api/stability/audio/audio-to-audio`
Transform audio using text instructions.
**Parameters:**
- `prompt` (required): Transformation description
- `audio` (required): Input audio file
- `duration` (optional): Output duration (1-190)
- `strength` (optional): Input influence (0-1)
**Cost:** 20 credits per generation
### Results Endpoint
#### GET `/api/stability/results/{generation_id}`
Get results from async generations.
**Parameters:**
- `generation_id` (required): ID from async operation
- `accept_type` (optional): Response format preference
**Response:** Generated content or status update
## Advanced Features
### Workflow Processing
The integration supports complex multi-step workflows:
```python
# Example workflow
workflow = [
{"operation": "generate_core", "parameters": {"prompt": "a landscape"}},
{"operation": "upscale_fast", "parameters": {}},
{"operation": "inpaint", "parameters": {"prompt": "add a house"}}
]
```
### Batch Processing
Process multiple images with the same operation:
```python
POST /api/stability/advanced/batch/process-folder
```
### Model Comparison
Compare results across different models:
```python
POST /api/stability/advanced/compare/models
```
### AI Director Mode
Automated creative decision making:
```python
POST /api/stability/advanced/experimental/ai-director
```
## Configuration
### Environment Variables
```bash
STABILITY_API_KEY=your_api_key_here
STABILITY_BASE_URL=https://api.stability.ai # Optional
STABILITY_TIMEOUT=300 # Optional
STABILITY_MAX_RETRIES=3 # Optional
STABILITY_MAX_FILE_SIZE=10485760 # Optional (10MB)
```
### Rate Limiting
- **Default Limit**: 150 requests per 10 seconds
- **Timeout**: 60 seconds when limit exceeded
- **Configurable**: Can be adjusted in middleware
### File Size Limits
- **Images**: 10MB maximum
- **Audio**: 50MB maximum
- **3D Models**: 10MB maximum
### Image Requirements
#### Generate Operations
- **Minimum**: 4,096 pixels total
- **Maximum**: 16,777,216 pixels total (16MP)
- **Dimensions**: At least 64x64 pixels
#### Edit Operations
- **Minimum**: 4,096 pixels total
- **Maximum**: 9,437,184 pixels total (~9.4MP)
- **Aspect Ratio**: Between 1:2.5 and 2.5:1
#### Upscale Operations
- **Fast**: 1,024 to 1,048,576 pixels, 32-1536px dimensions
- **Conservative**: 4,096 to 9,437,184 pixels
- **Creative**: 4,096 to 1,048,576 pixels
## Usage Examples
### Basic Text-to-Image Generation
```python
import requests
response = requests.post(
"http://localhost:8000/api/stability/generate/ultra",
data={
"prompt": "A majestic mountain landscape at sunset",
"aspect_ratio": "16:9",
"style_preset": "photographic"
}
)
if response.status_code == 200:
with open("generated_image.png", "wb") as f:
f.write(response.content)
```
### Image Editing with Inpainting
```python
files = {
"image": open("input.png", "rb"),
"mask": open("mask.png", "rb")
}
data = {
"prompt": "a beautiful garden",
"grow_mask": 10
}
response = requests.post(
"http://localhost:8000/api/stability/edit/inpaint",
files=files,
data=data
)
```
### Audio Generation
```python
response = requests.post(
"http://localhost:8000/api/stability/audio/text-to-audio",
data={
"prompt": "Peaceful piano music with nature sounds",
"duration": 60,
"model": "stable-audio-2.5"
}
)
if response.status_code == 200:
with open("generated_audio.mp3", "wb") as f:
f.write(response.content)
```
### 3D Model Generation
```python
files = {"image": open("object.png", "rb")}
response = requests.post(
"http://localhost:8000/api/stability/3d/stable-fast-3d",
files=files,
data={
"texture_resolution": "1024",
"foreground_ratio": 0.85
}
)
if response.status_code == 200:
with open("model.glb", "wb") as f:
f.write(response.content)
```
## Error Handling
The API provides comprehensive error handling:
### Common Error Codes
- **400**: Invalid parameters or file format
- **403**: Content moderation flag or insufficient permissions
- **413**: File too large
- **422**: Request well-formed but rejected
- **429**: Rate limit exceeded
- **500**: Internal server error
### Error Response Format
```json
{
"id": "error_id",
"name": "error_name",
"errors": ["Detailed error messages"]
}
```
## Monitoring and Analytics
### Health Check Endpoints
- `GET /api/stability/health` - Basic health check
- `GET /api/stability/admin/health/detailed` - Comprehensive health check
### Statistics Endpoints
- `GET /api/stability/admin/stats` - Service statistics
- `GET /api/stability/admin/usage/summary` - Usage summary
- `GET /api/stability/admin/request-logs` - Request logs
### Cost Estimation
- `GET /api/stability/admin/costs/estimate` - Estimate operation costs
## Best Practices
### Prompt Optimization
1. **Be Specific**: Use detailed, descriptive language
2. **Include Style**: Specify artistic style or photographic type
3. **Add Quality Terms**: Include "high quality", "detailed", "sharp"
4. **Use Negative Prompts**: Specify what you don't want
### Image Preparation
1. **Check Dimensions**: Ensure images meet size requirements
2. **Optimize File Size**: Compress large images before upload
3. **Use Appropriate Formats**: PNG for transparency, JPEG for photos
4. **Validate Aspect Ratios**: Check ratio requirements for operations
### Performance Optimization
1. **Use Appropriate Models**: Choose model based on speed vs quality needs
2. **Batch Operations**: Use batch endpoints for multiple similar operations
3. **Cache Results**: Enable caching for repeated operations
4. **Monitor Usage**: Track credit usage and optimize accordingly
## Security Considerations
### API Key Management
- Store API keys securely in environment variables
- Never commit API keys to version control
- Rotate keys regularly
- Monitor key usage for unauthorized access
### Content Moderation
- Built-in content moderation middleware
- Configurable blocked terms
- Automatic flagging of inappropriate content
- Audit logging for compliance
### Rate Limiting
- Automatic rate limiting per client
- Configurable limits and timeouts
- IP-based and API key-based limiting
- Graceful handling of limit exceeded scenarios
## Troubleshooting
### Common Issues
#### "API key missing or invalid"
- Check STABILITY_API_KEY environment variable
- Verify key is correct and active
- Check account balance
#### "Rate limit exceeded"
- Wait for timeout period (60 seconds)
- Implement request queuing
- Consider upgrading API plan
#### "File too large"
- Compress images before upload
- Check file size limits for operation
- Use appropriate image formats
#### "Invalid image dimensions"
- Check minimum/maximum pixel requirements
- Validate aspect ratio constraints
- Resize image if necessary
### Debug Endpoints
- `POST /api/stability/admin/debug/test-connection` - Test API connectivity
- `GET /api/stability/admin/debug/request-logs` - View recent requests
- `POST /api/stability/utils/image-info` - Analyze image properties
## Integration Examples
### React Frontend Integration
```javascript
// Upload and generate
const formData = new FormData();
formData.append('prompt', 'A beautiful landscape');
formData.append('aspect_ratio', '16:9');
const response = await fetch('/api/stability/generate/ultra', {
method: 'POST',
body: formData
});
if (response.ok) {
const blob = await response.blob();
const imageUrl = URL.createObjectURL(blob);
// Display image
}
```
### Python Service Integration
```python
from services.stability_service import StabilityAIService
async def generate_content_images(prompts: List[str]):
service = StabilityAIService()
async with service:
results = []
for prompt in prompts:
result = await service.generate_core(
prompt=prompt,
aspect_ratio="16:9"
)
results.append(result)
return results
```
## Performance Metrics
### Typical Response Times
- **Fast Operations** (Fast Upscale): ~1-2 seconds
- **Standard Operations** (Core Generation): ~5-10 seconds
- **Complex Operations** (Ultra Generation): ~10-20 seconds
- **Heavy Operations** (Creative Upscale): ~30-60 seconds
### Throughput
- **Rate Limit**: 150 requests per 10 seconds
- **Concurrent Requests**: Limited by API key
- **Batch Processing**: Recommended for multiple operations
## Future Enhancements
### Planned Features
1. **Advanced Caching**: Redis-based caching for better performance
2. **Queue Management**: Async job queue for heavy operations
3. **Result Storage**: Persistent storage for generated content
4. **Analytics Dashboard**: Real-time usage analytics
5. **Custom Workflows**: Visual workflow builder
6. **A/B Testing**: Compare different approaches automatically
### API Extensions
1. **Webhook Support**: Real-time notifications for async operations
2. **Streaming Responses**: Progressive image generation updates
3. **Template System**: Predefined generation templates
4. **Collaboration Features**: Shared workspaces and results
## Support
For issues and questions:
1. Check the troubleshooting section above
2. Review the test suite for usage examples
3. Check Stability AI documentation: https://platform.stability.ai/docs
4. Contact support through the admin panel
## Version History
- **v1.0.0**: Initial implementation with all major Stability AI features
- Complete API coverage for v2beta endpoints
- Legacy v1 API support
- Comprehensive middleware and utilities
- Full test suite and documentation