Files
ALwrity/backend/docs/STABILITY_AI_INTEGRATION.md
2025-09-23 12:38:27 +05:30

19 KiB

Stability AI Integration Documentation

This document provides comprehensive documentation for the Stability AI integration in the ALwrity backend.

Overview

The Stability AI integration provides access to all major Stability AI services including:

  • Image Generation: Ultra, Core, and SD3.5 models
  • Image Editing: Erase, Inpaint, Outpaint, Search & Replace, Search & Recolor, Background Removal
  • Image Upscaling: Fast, Conservative, and Creative upscaling
  • Image Control: Sketch, Structure, Style, and Style Transfer control
  • 3D Generation: Fast 3D and Point-Aware 3D model generation
  • Audio Generation: Text-to-Audio, Audio-to-Audio, and Audio Inpainting
  • Legacy V1 APIs: SDXL 1.0 and other V1 engines

Architecture

Modular Structure

backend/
├── models/
│   └── stability_models.py          # Pydantic models for all API schemas
├── services/
│   └── stability_service.py         # Core service class with HTTP client
├── routers/
│   ├── stability.py                 # Main API endpoints
│   ├── stability_advanced.py        # Advanced workflows and features
│   └── stability_admin.py           # Admin and monitoring endpoints
├── middleware/
│   └── stability_middleware.py      # Rate limiting, caching, monitoring
├── utils/
│   └── stability_utils.py           # Utility functions and validators
├── config/
│   └── stability_config.py          # Configuration and constants
└── test/
    └── test_stability_endpoints.py  # Comprehensive test suite

Key Components

  1. StabilityAIService: Core service class handling all API interactions
  2. Pydantic Models: Comprehensive request/response models with validation
  3. FastAPI Routers: Organized endpoints for different service categories
  4. Middleware: Rate limiting, caching, monitoring, and content moderation
  5. Utilities: File handling, validation, optimization, and workflow management

API Endpoints

Generation Endpoints

POST /api/stability/generate/ultra

Generate high-quality images using Stable Image Ultra.

Parameters:

  • prompt (required): Text description of desired image
  • image (optional): Input image for image-to-image generation
  • negative_prompt (optional): What you don't want to see
  • aspect_ratio (optional): Image aspect ratio (default: "1:1")
  • seed (optional): Random seed (0-4294967294)
  • output_format (optional): Output format (jpeg, png, webp)
  • style_preset (optional): Style preset
  • strength (optional): Image influence strength (required if image provided)

Response: Image bytes or JSON with generation ID

Cost: 8 credits per generation

POST /api/stability/generate/core

Fast and affordable image generation.

Parameters:

  • prompt (required): Text description
  • negative_prompt (optional): Negative prompt
  • aspect_ratio (optional): Image aspect ratio
  • seed (optional): Random seed
  • output_format (optional): Output format
  • style_preset (optional): Style preset

Cost: 3 credits per generation

POST /api/stability/generate/sd3

Generate using Stable Diffusion 3.5 models.

Parameters:

  • prompt (required): Text description
  • mode (optional): "text-to-image" or "image-to-image"
  • image (optional): Input image (required for image-to-image)
  • strength (optional): Image influence (required for image-to-image)
  • aspect_ratio (optional): Image aspect ratio (text-to-image only)
  • model (optional): SD3 model variant
  • cfg_scale (optional): CFG scale (1-10)

Cost: 2.5-6.5 credits depending on model

Edit Endpoints

POST /api/stability/edit/erase

Remove unwanted objects using masks.

Parameters:

  • image (required): Image file to edit
  • mask (optional): Mask image (or use alpha channel)
  • grow_mask (optional): Mask edge growth (0-20 pixels)
  • seed (optional): Random seed
  • output_format (optional): Output format

Cost: 5 credits per generation

POST /api/stability/edit/inpaint

Fill or replace specified areas with new content.

Parameters:

  • image (required): Image file to edit
  • prompt (required): Description of desired content
  • mask (optional): Mask image
  • negative_prompt (optional): Negative prompt
  • grow_mask (optional): Mask edge growth (0-100 pixels)
  • style_preset (optional): Style preset

Cost: 5 credits per generation

POST /api/stability/edit/outpaint

Expand image in specified directions.

Parameters:

  • image (required): Image file to expand
  • left (optional): Pixels to expand left (0-2000)
  • right (optional): Pixels to expand right (0-2000)
  • up (optional): Pixels to expand up (0-2000)
  • down (optional): Pixels to expand down (0-2000)
  • creativity (optional): Creativity level (0-1)
  • prompt (optional): Guidance prompt

Note: At least one direction must be specified.

Cost: 4 credits per generation

POST /api/stability/edit/search-and-replace

Replace objects using text prompts instead of masks.

Parameters:

  • image (required): Image file to edit
  • prompt (required): Description of replacement
  • search_prompt (required): What to search for
  • grow_mask (optional): Mask edge growth (0-20 pixels)

Cost: 5 credits per generation

POST /api/stability/edit/search-and-recolor

Change colors of specific objects using prompts.

Parameters:

  • image (required): Image file to edit
  • prompt (required): Description of new colors
  • select_prompt (required): What to select for recoloring

Cost: 5 credits per generation

POST /api/stability/edit/remove-background

Remove background from images.

Parameters:

  • image (required): Image file
  • output_format (optional): Output format (png, webp)

Cost: 5 credits per generation

Upscale Endpoints

POST /api/stability/upscale/fast

Fast 4x upscaling (~1 second processing).

Parameters:

  • image (required): Image file to upscale
  • output_format (optional): Output format

Cost: 2 credits per generation

POST /api/stability/upscale/conservative

Conservative upscaling to 4K with minimal changes.

Parameters:

  • image (required): Image file to upscale
  • prompt (required): Description for guidance
  • creativity (optional): Creativity level (0.2-0.5)

Cost: 40 credits per generation

POST /api/stability/upscale/creative

Creative upscaling for highly degraded images (async).

Parameters:

  • image (required): Image file to upscale
  • prompt (required): Description for guidance
  • creativity (optional): Creativity level (0.1-0.5)
  • style_preset (optional): Style preset

Cost: 60 credits per generation

Control Endpoints

POST /api/stability/control/sketch

Generate refined images from sketches.

Parameters:

  • image (required): Sketch or line art
  • prompt (required): Description of desired result
  • control_strength (optional): Control strength (0-1)

Cost: 5 credits per generation

POST /api/stability/control/structure

Maintain structure while changing content.

Parameters:

  • image (required): Structure reference image
  • prompt (required): Description of desired result
  • control_strength (optional): Control strength (0-1)

Cost: 5 credits per generation

POST /api/stability/control/style

Extract and apply style from reference image.

Parameters:

  • image (required): Style reference image
  • prompt (required): Description of desired result
  • aspect_ratio (optional): Output aspect ratio
  • fidelity (optional): Style fidelity (0-1)

Cost: 5 credits per generation

POST /api/stability/control/style-transfer

Transfer style between two images.

Parameters:

  • init_image (required): Image to restyle
  • style_image (required): Style reference
  • style_strength (optional): Style strength (0-1)
  • composition_fidelity (optional): Composition preservation (0-1)

Cost: 8 credits per generation

3D Endpoints

POST /api/stability/3d/stable-fast-3d

Generate 3D models from 2D images (fast).

Parameters:

  • image (required): 2D image to convert
  • texture_resolution (optional): Texture resolution (512, 1024, 2048)
  • foreground_ratio (optional): Object size ratio (0.1-1)
  • remesh (optional): Remesh algorithm (none, triangle, quad)

Output: GLB 3D model file

Cost: 10 credits per generation

POST /api/stability/3d/stable-point-aware-3d

Advanced 3D generation with editing capabilities.

Parameters:

  • image (required): 2D image to convert
  • texture_resolution (optional): Texture resolution
  • foreground_ratio (optional): Object size ratio (1-2)
  • target_type (optional): Simplification target (none, vertex, face)
  • guidance_scale (optional): Guidance scale (1-10)

Cost: 4 credits per generation

Audio Endpoints

POST /api/stability/audio/text-to-audio

Generate audio from text descriptions.

Parameters:

  • prompt (required): Audio description
  • duration (optional): Duration in seconds (1-190)
  • model (optional): Audio model (stable-audio-2, stable-audio-2.5)
  • steps (optional): Sampling steps (model-dependent)
  • cfg_scale (optional): CFG scale (1-25)

Cost: 20 credits per generation

POST /api/stability/audio/audio-to-audio

Transform audio using text instructions.

Parameters:

  • prompt (required): Transformation description
  • audio (required): Input audio file
  • duration (optional): Output duration (1-190)
  • strength (optional): Input influence (0-1)

Cost: 20 credits per generation

Results Endpoint

GET /api/stability/results/{generation_id}

Get results from async generations.

Parameters:

  • generation_id (required): ID from async operation
  • accept_type (optional): Response format preference

Response: Generated content or status update

Advanced Features

Workflow Processing

The integration supports complex multi-step workflows:

# Example workflow
workflow = [
    {"operation": "generate_core", "parameters": {"prompt": "a landscape"}},
    {"operation": "upscale_fast", "parameters": {}},
    {"operation": "inpaint", "parameters": {"prompt": "add a house"}}
]

Batch Processing

Process multiple images with the same operation:

POST /api/stability/advanced/batch/process-folder

Model Comparison

Compare results across different models:

POST /api/stability/advanced/compare/models

AI Director Mode

Automated creative decision making:

POST /api/stability/advanced/experimental/ai-director

Configuration

Environment Variables

STABILITY_API_KEY=your_api_key_here
STABILITY_BASE_URL=https://api.stability.ai  # Optional
STABILITY_TIMEOUT=300                         # Optional
STABILITY_MAX_RETRIES=3                      # Optional
STABILITY_MAX_FILE_SIZE=10485760             # Optional (10MB)

Rate Limiting

  • Default Limit: 150 requests per 10 seconds
  • Timeout: 60 seconds when limit exceeded
  • Configurable: Can be adjusted in middleware

File Size Limits

  • Images: 10MB maximum
  • Audio: 50MB maximum
  • 3D Models: 10MB maximum

Image Requirements

Generate Operations

  • Minimum: 4,096 pixels total
  • Maximum: 16,777,216 pixels total (16MP)
  • Dimensions: At least 64x64 pixels

Edit Operations

  • Minimum: 4,096 pixels total
  • Maximum: 9,437,184 pixels total (~9.4MP)
  • Aspect Ratio: Between 1:2.5 and 2.5:1

Upscale Operations

  • Fast: 1,024 to 1,048,576 pixels, 32-1536px dimensions
  • Conservative: 4,096 to 9,437,184 pixels
  • Creative: 4,096 to 1,048,576 pixels

Usage Examples

Basic Text-to-Image Generation

import requests

response = requests.post(
    "http://localhost:8000/api/stability/generate/ultra",
    data={
        "prompt": "A majestic mountain landscape at sunset",
        "aspect_ratio": "16:9",
        "style_preset": "photographic"
    }
)

if response.status_code == 200:
    with open("generated_image.png", "wb") as f:
        f.write(response.content)

Image Editing with Inpainting

files = {
    "image": open("input.png", "rb"),
    "mask": open("mask.png", "rb")
}

data = {
    "prompt": "a beautiful garden",
    "grow_mask": 10
}

response = requests.post(
    "http://localhost:8000/api/stability/edit/inpaint",
    files=files,
    data=data
)

Audio Generation

response = requests.post(
    "http://localhost:8000/api/stability/audio/text-to-audio",
    data={
        "prompt": "Peaceful piano music with nature sounds",
        "duration": 60,
        "model": "stable-audio-2.5"
    }
)

if response.status_code == 200:
    with open("generated_audio.mp3", "wb") as f:
        f.write(response.content)

3D Model Generation

files = {"image": open("object.png", "rb")}

response = requests.post(
    "http://localhost:8000/api/stability/3d/stable-fast-3d",
    files=files,
    data={
        "texture_resolution": "1024",
        "foreground_ratio": 0.85
    }
)

if response.status_code == 200:
    with open("model.glb", "wb") as f:
        f.write(response.content)

Error Handling

The API provides comprehensive error handling:

Common Error Codes

  • 400: Invalid parameters or file format
  • 403: Content moderation flag or insufficient permissions
  • 413: File too large
  • 422: Request well-formed but rejected
  • 429: Rate limit exceeded
  • 500: Internal server error

Error Response Format

{
    "id": "error_id",
    "name": "error_name",
    "errors": ["Detailed error messages"]
}

Monitoring and Analytics

Health Check Endpoints

  • GET /api/stability/health - Basic health check
  • GET /api/stability/admin/health/detailed - Comprehensive health check

Statistics Endpoints

  • GET /api/stability/admin/stats - Service statistics
  • GET /api/stability/admin/usage/summary - Usage summary
  • GET /api/stability/admin/request-logs - Request logs

Cost Estimation

  • GET /api/stability/admin/costs/estimate - Estimate operation costs

Best Practices

Prompt Optimization

  1. Be Specific: Use detailed, descriptive language
  2. Include Style: Specify artistic style or photographic type
  3. Add Quality Terms: Include "high quality", "detailed", "sharp"
  4. Use Negative Prompts: Specify what you don't want

Image Preparation

  1. Check Dimensions: Ensure images meet size requirements
  2. Optimize File Size: Compress large images before upload
  3. Use Appropriate Formats: PNG for transparency, JPEG for photos
  4. Validate Aspect Ratios: Check ratio requirements for operations

Performance Optimization

  1. Use Appropriate Models: Choose model based on speed vs quality needs
  2. Batch Operations: Use batch endpoints for multiple similar operations
  3. Cache Results: Enable caching for repeated operations
  4. Monitor Usage: Track credit usage and optimize accordingly

Security Considerations

API Key Management

  • Store API keys securely in environment variables
  • Never commit API keys to version control
  • Rotate keys regularly
  • Monitor key usage for unauthorized access

Content Moderation

  • Built-in content moderation middleware
  • Configurable blocked terms
  • Automatic flagging of inappropriate content
  • Audit logging for compliance

Rate Limiting

  • Automatic rate limiting per client
  • Configurable limits and timeouts
  • IP-based and API key-based limiting
  • Graceful handling of limit exceeded scenarios

Troubleshooting

Common Issues

"API key missing or invalid"

  • Check STABILITY_API_KEY environment variable
  • Verify key is correct and active
  • Check account balance

"Rate limit exceeded"

  • Wait for timeout period (60 seconds)
  • Implement request queuing
  • Consider upgrading API plan

"File too large"

  • Compress images before upload
  • Check file size limits for operation
  • Use appropriate image formats

"Invalid image dimensions"

  • Check minimum/maximum pixel requirements
  • Validate aspect ratio constraints
  • Resize image if necessary

Debug Endpoints

  • POST /api/stability/admin/debug/test-connection - Test API connectivity
  • GET /api/stability/admin/debug/request-logs - View recent requests
  • POST /api/stability/utils/image-info - Analyze image properties

Integration Examples

React Frontend Integration

// Upload and generate
const formData = new FormData();
formData.append('prompt', 'A beautiful landscape');
formData.append('aspect_ratio', '16:9');

const response = await fetch('/api/stability/generate/ultra', {
    method: 'POST',
    body: formData
});

if (response.ok) {
    const blob = await response.blob();
    const imageUrl = URL.createObjectURL(blob);
    // Display image
}

Python Service Integration

from services.stability_service import StabilityAIService

async def generate_content_images(prompts: List[str]):
    service = StabilityAIService()
    
    async with service:
        results = []
        for prompt in prompts:
            result = await service.generate_core(
                prompt=prompt,
                aspect_ratio="16:9"
            )
            results.append(result)
    
    return results

Performance Metrics

Typical Response Times

  • Fast Operations (Fast Upscale): ~1-2 seconds
  • Standard Operations (Core Generation): ~5-10 seconds
  • Complex Operations (Ultra Generation): ~10-20 seconds
  • Heavy Operations (Creative Upscale): ~30-60 seconds

Throughput

  • Rate Limit: 150 requests per 10 seconds
  • Concurrent Requests: Limited by API key
  • Batch Processing: Recommended for multiple operations

Future Enhancements

Planned Features

  1. Advanced Caching: Redis-based caching for better performance
  2. Queue Management: Async job queue for heavy operations
  3. Result Storage: Persistent storage for generated content
  4. Analytics Dashboard: Real-time usage analytics
  5. Custom Workflows: Visual workflow builder
  6. A/B Testing: Compare different approaches automatically

API Extensions

  1. Webhook Support: Real-time notifications for async operations
  2. Streaming Responses: Progressive image generation updates
  3. Template System: Predefined generation templates
  4. Collaboration Features: Shared workspaces and results

Support

For issues and questions:

  1. Check the troubleshooting section above
  2. Review the test suite for usage examples
  3. Check Stability AI documentation: https://platform.stability.ai/docs
  4. Contact support through the admin panel

Version History

  • v1.0.0: Initial implementation with all major Stability AI features
    • Complete API coverage for v2beta endpoints
    • Legacy v1 API support
    • Comprehensive middleware and utilities
    • Full test suite and documentation