58 lines
1.6 KiB
Markdown
58 lines
1.6 KiB
Markdown
---
|
|
name: image-analyze
|
|
description: Analyze images using vision AI when the current model doesn't support image input. Use this skill when you need to understand, describe, or extract information from images.
|
|
---
|
|
|
|
# Image Analyze
|
|
|
|
Analyze images with vision AI via `python3 scripts/analyze_image.py <image_path> [prompt]`.
|
|
|
|
## Commands
|
|
|
|
| Command | Args | Description |
|
|
|---------|------|-------------|
|
|
| `analyze` | `<image_path> [prompt]` | Analyze image with optional custom prompt |
|
|
|
|
## Options
|
|
|
|
| Option | Default | Description |
|
|
|--------|---------|-------------|
|
|
| `--max-tokens` | 1024 | Maximum tokens in response |
|
|
| `--temperature` | 0.7 | Response creativity (0-2) |
|
|
| `--model` | moonshotai/Kimi-K2.5-TEE | Vision model to use |
|
|
|
|
## Examples
|
|
|
|
```bash
|
|
# Basic analysis
|
|
python3 scripts/analyze_image.py photo.jpg
|
|
|
|
# With custom prompt
|
|
python3 scripts/analyze_image.py diagram.png "Extract all text and explain the workflow"
|
|
|
|
# Detailed analysis
|
|
python3 scripts/analyze_image.py screenshot.png "Describe all UI elements and their positions"
|
|
|
|
# OCR-like extraction
|
|
python3 scripts/analyze_image.py document.jpg "Transcribe all text exactly as shown"
|
|
```
|
|
|
|
## Workflow
|
|
|
|
1. Provide image path (PNG, JPG, JPEG, GIF, WEBP, BMP)
|
|
2. Optionally provide custom analysis prompt
|
|
3. Script converts image to base64 and sends to vision API
|
|
4. Returns detailed analysis text
|
|
|
|
## Output Format
|
|
|
|
- Success: Analysis text directly
|
|
- Error: `Error: message` (to stderr)
|
|
|
|
## Notes
|
|
|
|
- Requires `CHUTES_API_TOKEN` in environment
|
|
- Uses Kimi-K2.5-TEE vision model via Chutes AI
|
|
- Supports common image formats
|
|
- Best for: image description, OCR, UI analysis, diagram interpretation
|