feat: add phantom-canvas skill for Gemini image generation
This commit is contained in:
215
skills/phantom-canvas/SKILL.md
Normal file
215
skills/phantom-canvas/SKILL.md
Normal file
@@ -0,0 +1,215 @@
|
|||||||
|
---
|
||||||
|
name: phantom-canvas
|
||||||
|
description: CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more.
|
||||||
|
allowed-tools: Bash
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phantom Canvas
|
||||||
|
|
||||||
|
**CLI + HTTP API** for image and video generation through **Gemini Web**. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI.
|
||||||
|
|
||||||
|
Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Quick Check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify installation
|
||||||
|
phantom-canvas --help
|
||||||
|
```
|
||||||
|
|
||||||
|
If you see the logo and command list, it's installed. If not:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm install -g phantom-canvas
|
||||||
|
```
|
||||||
|
|
||||||
|
## First-Time Setup (Required Once)
|
||||||
|
|
||||||
|
The first time, Chrome needs to open visibly so you can log into your Google account:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Open Chrome — a window appears. Log into Google in that window.
|
||||||
|
phantom-canvas chrome
|
||||||
|
|
||||||
|
# Or just generate with --headed:
|
||||||
|
phantom-canvas generate "test image" --headed
|
||||||
|
```
|
||||||
|
|
||||||
|
After logging in once, the session persists at `~/.phantom-canvas/chrome-profile/`. All future runs can be headless.
|
||||||
|
|
||||||
|
**When you see "Session expired"**, tell the user to run `phantom-canvas chrome` and re-login. Do NOT automate login — it requires human interaction with Google auth.
|
||||||
|
|
||||||
|
## CLI Generate (for agents and scripts)
|
||||||
|
|
||||||
|
Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px).
|
||||||
|
|
||||||
|
### Text-to-Image
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Simple generate
|
||||||
|
phantom-canvas generate "your prompt here" -o output.png
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns JSON:
|
||||||
|
```json
|
||||||
|
{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Image-to-Image (Reference Upload)
|
||||||
|
|
||||||
|
Use an existing image as visual reference:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png
|
||||||
|
```
|
||||||
|
|
||||||
|
Reference path must be absolute or relative to cwd.
|
||||||
|
|
||||||
|
### Multi-Turn (Iterative Design)
|
||||||
|
|
||||||
|
Continue in the same Gemini conversation to refine:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Round 1
|
||||||
|
RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png)
|
||||||
|
CONV=$(echo $RESULT | jq -r .conversation_id)
|
||||||
|
|
||||||
|
# Round 2 — Gemini remembers the character
|
||||||
|
phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png
|
||||||
|
|
||||||
|
# Round 3
|
||||||
|
phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png
|
||||||
|
```
|
||||||
|
|
||||||
|
### Video Generation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4
|
||||||
|
```
|
||||||
|
|
||||||
|
Video takes 1-2 minutes. Gemini has daily video quotas.
|
||||||
|
|
||||||
|
### CLI Options
|
||||||
|
|
||||||
|
| Flag | Description |
|
||||||
|
|---|---|
|
||||||
|
| `-o, --output <file>` | Output file path |
|
||||||
|
| `--ref <file>` | Reference image (absolute path) |
|
||||||
|
| `--video` | Generate video instead of image |
|
||||||
|
| `--conversation <id>` | Continue previous conversation |
|
||||||
|
| `--timeout <secs>` | Timeout (default: 180 image, 300 video) |
|
||||||
|
| `--headed` | Show browser window (default: headless) |
|
||||||
|
| `--cdp <url>` | Chrome DevTools URL (default: http://127.0.0.1:9222) |
|
||||||
|
|
||||||
|
## HTTP API (Server Mode)
|
||||||
|
|
||||||
|
For programmatic access from apps, pipelines, or webhooks:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start the server
|
||||||
|
phantom-canvas serve [--port 8420]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generate an Image
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns immediately with a task ID:
|
||||||
|
```json
|
||||||
|
{"task_id": "abc123", "status": "queued"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Task Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl localhost:8420/task/abc123
|
||||||
|
```
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"task_id": "abc123",
|
||||||
|
"status": "completed",
|
||||||
|
"images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}],
|
||||||
|
"conversation_id": "05695dfd143c4dad",
|
||||||
|
"elapsed_secs": 45.3
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Download Result
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl localhost:8420/task/abc123/image/0 -o result.png
|
||||||
|
```
|
||||||
|
|
||||||
|
### With Reference Image
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"prompt": "same character from 4 angles in a 2x2 grid",
|
||||||
|
"reference_images": ["/absolute/path/to/sprite.png"]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
> `reference_images` must be **absolute local file paths**.
|
||||||
|
|
||||||
|
### Check Server Health
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl localhost:8420/health
|
||||||
|
```
|
||||||
|
|
||||||
|
### API Parameters (POST /generate)
|
||||||
|
|
||||||
|
| Field | Type | Default | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `prompt` | string | required | Generation prompt |
|
||||||
|
| `type` | `"image"` \| `"video"` | `"image"` | Output type |
|
||||||
|
| `reference_images` | string[] | — | Absolute paths to reference images |
|
||||||
|
| `num_images` | number | 1 | How many images to generate |
|
||||||
|
| `timeout_secs` | number | 180 / 300 | Timeout (image / video) |
|
||||||
|
| `callback_url` | string | — | Webhook URL for async completion |
|
||||||
|
| `conversation_id` | string | — | Continue existing conversation |
|
||||||
|
|
||||||
|
## Best Practices for Website Images
|
||||||
|
|
||||||
|
### Use #00FF00 Green Background
|
||||||
|
|
||||||
|
Instead of asking for "transparent background", use:
|
||||||
|
|
||||||
|
```
|
||||||
|
"pixel art knight, isometric view, solid green #00FF00 chroma-key background"
|
||||||
|
```
|
||||||
|
|
||||||
|
Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing.
|
||||||
|
|
||||||
|
### Iterative Workflow
|
||||||
|
|
||||||
|
For website hero images or banners, use multi-turn:
|
||||||
|
|
||||||
|
1. First generate a base concept
|
||||||
|
2. Use `--conversation` to refine colors, layout, text
|
||||||
|
3. Download the final version
|
||||||
|
|
||||||
|
### Prompt Tips
|
||||||
|
|
||||||
|
- Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art"
|
||||||
|
- Specify dimensions and composition when possible
|
||||||
|
- For website images, mention the context: "hero banner for a tech startup website"
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
| Error | Action |
|
||||||
|
|---|---|
|
||||||
|
| Chrome failed to start | Install Chrome or set Chrome path |
|
||||||
|
| "Session expired" | Run `phantom-canvas chrome`, re-login in the browser window |
|
||||||
|
| Timeout / empty images | Retry with different prompt or longer `--timeout` |
|
||||||
|
| Video quota exceeded | Wait until tomorrow (Gemini daily limit) |
|
||||||
|
| Port already in use | Use `--port 8430` or different port |
|
||||||
BIN
skills/phantom-canvas/logo.png
Normal file
BIN
skills/phantom-canvas/logo.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 379 KiB |
28
skills/phantom-canvas/references/AGENTS.md
Normal file
28
skills/phantom-canvas/references/AGENTS.md
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
# AGENTS.md
|
||||||
|
|
||||||
|
CLI tool and HTTP API for AI image/video generation via Gemini Web.
|
||||||
|
|
||||||
|
## What this tool does
|
||||||
|
|
||||||
|
Phantom Canvas wraps Gemini Web as a programmable CLI and HTTP API. It launches Chrome via CDP, automates Gemini's web UI, and exposes generation capabilities for AI agents, scripts, and applications.
|
||||||
|
|
||||||
|
## How to use
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bun add -g phantom-canvas # or: npm install -g phantom-canvas
|
||||||
|
phantom-canvas generate "your prompt" -o output.png --headed # first time: login in Chrome
|
||||||
|
phantom-canvas generate "your prompt" -o output.png # after that: headless
|
||||||
|
```
|
||||||
|
|
||||||
|
See [SKILL.md](SKILL.md) for complete agent instructions.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
- `index.ts` — CLI entry point (chrome / generate / serve)
|
||||||
|
- `lib/browser.ts` — Browser automation (Chrome CDP + Playwright)
|
||||||
|
- `lib/tasks.ts` — Async task queue
|
||||||
|
- `dist/index.js` — Compiled Node.js bundle
|
||||||
|
|
||||||
|
## Session
|
||||||
|
|
||||||
|
Chrome stores login in `~/.phantom-canvas/chrome-profile/`. First time requires `--headed` to login interactively. After that, login persists and headless mode works.
|
||||||
BIN
skills/phantom-canvas/references/diagram.png
Normal file
BIN
skills/phantom-canvas/references/diagram.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 5.9 MiB |
307
skills/phantom-canvas/references/examples.md
Normal file
307
skills/phantom-canvas/references/examples.md
Normal file
@@ -0,0 +1,307 @@
|
|||||||
|
# Phantom Canvas — Usage Examples
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bun install -g github:baixianger/phantom-canvas
|
||||||
|
phantom-canvas login # first time: login to Google
|
||||||
|
phantom-canvas # start server on :8420
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Text-to-Image
|
||||||
|
|
||||||
|
Generate an image from a text prompt. Each request starts a new Gemini conversation.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Submit
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"prompt": "Isometric pixel art knight with sword and shield, Final Fantasy Tactics style, on solid green #00FF00 chroma-key background, standing idle pose"
|
||||||
|
}'
|
||||||
|
# => {"task_id": "abc123", "status": "queued"}
|
||||||
|
|
||||||
|
# Poll status
|
||||||
|
curl localhost:8420/task/abc123
|
||||||
|
# => {"status": "completed", "conversation_id": "05695dfd143c4dad", "images": [...]}
|
||||||
|
|
||||||
|
# Download image
|
||||||
|
curl localhost:8420/task/abc123/image/0 -o knight.png
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Image-to-Image (Reference Upload)
|
||||||
|
|
||||||
|
Upload a reference image so Gemini keeps the same character design.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate anchor sprite first
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-d '{"prompt": "SE-facing isometric pixel art pirate, red bandana, blue tunic, FFT style, #00FF00 bg"}'
|
||||||
|
# Wait, download → pirate.png
|
||||||
|
|
||||||
|
# Use it as reference for 4-direction sheet
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"prompt": "Using the uploaded character, create a 2x2 sprite sheet: top-left=North (back), top-right=East (right side), bottom-left=South (front), bottom-right=SE (same as ref). Same pixel art style, same green #00FF00 background.",
|
||||||
|
"reference_images": ["/absolute/path/to/pirate.png"]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
> `reference_images` must be **absolute local file paths**. The browser uploads them through Gemini's file upload UI.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Multi-Turn Conversation
|
||||||
|
|
||||||
|
Continue in the same Gemini chat to iterate on a design. Pass `conversation_id` from a previous task.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Step 1: initial generation
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-d '{"prompt": "Pixel art knight character, isometric, green background"}'
|
||||||
|
# => {"task_id": "aaa", "status": "queued"}
|
||||||
|
|
||||||
|
# Get conversation_id from result
|
||||||
|
curl localhost:8420/task/aaa
|
||||||
|
# => {"conversation_id": "05695dfd143c4dad", "images": [...]}
|
||||||
|
|
||||||
|
# Step 2: refine in same conversation — Gemini remembers context
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"prompt": "Now make the sword larger and add a red cape",
|
||||||
|
"conversation_id": "05695dfd143c4dad"
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Step 3: generate variations
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-d '{
|
||||||
|
"prompt": "Show this character from 4 different angles in a 2x2 grid",
|
||||||
|
"conversation_id": "05695dfd143c4dad"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
> Multi-turn is useful for iterative design. Gemini keeps the visual context from previous messages.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Video Generation
|
||||||
|
|
||||||
|
Generate walk cycle animations. Takes 1-2 minutes. Gemini has daily video quotas.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"prompt": "Short looping video of a pixel art knight walking in place, isometric view, Final Fantasy Tactics style",
|
||||||
|
"type": "video",
|
||||||
|
"timeout_secs": 300
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
With reference image:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-d '{
|
||||||
|
"prompt": "Looping walk cycle animation of this exact character",
|
||||||
|
"reference_images": ["/path/to/knight.png"],
|
||||||
|
"type": "video",
|
||||||
|
"timeout_secs": 300
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Webhook Callback
|
||||||
|
|
||||||
|
Get notified when generation completes instead of polling.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST localhost:8420/generate \
|
||||||
|
-d '{
|
||||||
|
"prompt": "pixel art mage with staff, isometric, green bg",
|
||||||
|
"callback_url": "http://localhost:3000/webhook"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Your webhook receives:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"task_id": "abc123",
|
||||||
|
"status": "completed",
|
||||||
|
"images": [{"index": 0, "url": "/task/abc123/image/0"}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Full Pipeline — Game Asset Turnaround
|
||||||
|
|
||||||
|
Complete workflow for generating an 8-way isometric sprite sheet:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
API=http://localhost:8420
|
||||||
|
OUT=./sprites
|
||||||
|
|
||||||
|
# Stage 1: Anchor sprite
|
||||||
|
TASK=$(curl -s -X POST $API/generate -d '{
|
||||||
|
"prompt": "Single SE-facing isometric pixel art knight, dark armor, red cape, sword and shield, FFT style, solid #00FF00 green background, no shadow"
|
||||||
|
}' | jq -r .task_id)
|
||||||
|
|
||||||
|
echo "Stage 1: $TASK"
|
||||||
|
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
|
||||||
|
curl -s $API/task/$TASK/image/0 -o $OUT/anchor.png
|
||||||
|
CONV=$(curl -s $API/task/$TASK | jq -r .conversation_id)
|
||||||
|
echo "Anchor saved. Conversation: $CONV"
|
||||||
|
|
||||||
|
# Stage 2: Cardinal facings (multi-turn, Gemini remembers the knight)
|
||||||
|
TASK=$(curl -s -X POST $API/generate -d "{
|
||||||
|
\"prompt\": \"Now create a 2x2 sprite sheet of this SAME knight from 4 angles: top-left=North (back view), top-right=East (right side), bottom-left=South (front view), bottom-right=SE (same as before). Same style, same green background.\",
|
||||||
|
\"conversation_id\": \"$CONV\"
|
||||||
|
}" | jq -r .task_id)
|
||||||
|
|
||||||
|
echo "Stage 2: $TASK"
|
||||||
|
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
|
||||||
|
curl -s $API/task/$TASK/image/0 -o $OUT/cardinals.png
|
||||||
|
|
||||||
|
# Stage 3: Diagonal facings
|
||||||
|
TASK=$(curl -s -X POST $API/generate -d "{
|
||||||
|
\"prompt\": \"Now create 4 diagonal views in a 2x2 grid: NW (mostly back + left side), NE (mostly back + right side), SW (mostly front + left side), SE (mostly front + right side). Same character, same style.\",
|
||||||
|
\"conversation_id\": \"$CONV\"
|
||||||
|
}" | jq -r .task_id)
|
||||||
|
|
||||||
|
echo "Stage 3: $TASK"
|
||||||
|
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
|
||||||
|
curl -s $API/task/$TASK/image/0 -o $OUT/diagonals.png
|
||||||
|
|
||||||
|
# Stage 4: Assembly (local code, no API needed)
|
||||||
|
# python3 assemble.py $OUT/cardinals.png $OUT/diagonals.png $OUT/turnaround.png
|
||||||
|
|
||||||
|
# Stage 5: Walk animation
|
||||||
|
TASK=$(curl -s -X POST $API/generate -d "{
|
||||||
|
\"prompt\": \"Create a short looping walk cycle video of this knight, isometric SW-facing, walking in place\",
|
||||||
|
\"conversation_id\": \"$CONV\",
|
||||||
|
\"type\": \"video\",
|
||||||
|
\"timeout_secs\": 300
|
||||||
|
}" | jq -r .task_id)
|
||||||
|
|
||||||
|
echo "Stage 5: $TASK"
|
||||||
|
while [ "$(curl -s $API/task/$TASK | jq -r .status)" != "completed" ]; do sleep 15; done
|
||||||
|
curl -s $API/task/$TASK/image/0 -o $OUT/walk.mp4
|
||||||
|
|
||||||
|
echo "Done! Files in $OUT/"
|
||||||
|
ls -la $OUT/
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. TypeScript/Bun Client
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const API = "http://localhost:8420";
|
||||||
|
|
||||||
|
async function generate(opts: {
|
||||||
|
prompt: string;
|
||||||
|
type?: "image" | "video";
|
||||||
|
referenceImages?: string[];
|
||||||
|
conversationId?: string;
|
||||||
|
}) {
|
||||||
|
// Submit task
|
||||||
|
const { task_id } = await fetch(`${API}/generate`, {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify({
|
||||||
|
prompt: opts.prompt,
|
||||||
|
type: opts.type ?? "image",
|
||||||
|
reference_images: opts.referenceImages,
|
||||||
|
conversation_id: opts.conversationId,
|
||||||
|
}),
|
||||||
|
}).then((r) => r.json());
|
||||||
|
|
||||||
|
// Poll until done
|
||||||
|
while (true) {
|
||||||
|
const task = await fetch(`${API}/task/${task_id}`).then((r) => r.json());
|
||||||
|
if (task.status === "completed") return task;
|
||||||
|
if (task.status === "failed") throw new Error(task.error);
|
||||||
|
await Bun.sleep(5000);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Usage
|
||||||
|
const anchor = await generate({
|
||||||
|
prompt: "Isometric pixel art knight, FFT style, green #00FF00 bg",
|
||||||
|
});
|
||||||
|
console.log("Anchor:", anchor.images[0].url);
|
||||||
|
|
||||||
|
// Multi-turn: iterate on the same character
|
||||||
|
const refined = await generate({
|
||||||
|
prompt: "Make the sword bigger and add a glowing effect",
|
||||||
|
conversationId: anchor.conversation_id,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Save image
|
||||||
|
const img = await fetch(`${API}${refined.images[0].url}`);
|
||||||
|
await Bun.write("knight.png", img);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Python Client
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests, time
|
||||||
|
|
||||||
|
API = "http://localhost:8420"
|
||||||
|
|
||||||
|
def generate(prompt, type="image", reference_images=None, conversation_id=None, timeout=180):
|
||||||
|
"""Submit generation task and wait for result."""
|
||||||
|
resp = requests.post(f"{API}/generate", json={
|
||||||
|
"prompt": prompt,
|
||||||
|
"type": type,
|
||||||
|
"reference_images": reference_images,
|
||||||
|
"conversation_id": conversation_id,
|
||||||
|
"timeout_secs": timeout,
|
||||||
|
})
|
||||||
|
task_id = resp.json()["task_id"]
|
||||||
|
|
||||||
|
while True:
|
||||||
|
task = requests.get(f"{API}/task/{task_id}").json()
|
||||||
|
if task["status"] == "completed":
|
||||||
|
return task
|
||||||
|
if task["status"] == "failed":
|
||||||
|
raise RuntimeError(task["error"])
|
||||||
|
time.sleep(5)
|
||||||
|
|
||||||
|
def download(task, index=0, path="output.png"):
|
||||||
|
"""Download generated file."""
|
||||||
|
url = f"{API}{task['images'][index]['url']}"
|
||||||
|
with open(path, "wb") as f:
|
||||||
|
f.write(requests.get(url).content)
|
||||||
|
|
||||||
|
# Text-to-image
|
||||||
|
result = generate("pixel art knight, isometric, green bg")
|
||||||
|
download(result, path="knight.png")
|
||||||
|
|
||||||
|
# Multi-turn
|
||||||
|
result2 = generate(
|
||||||
|
"Now show 4 directions in a 2x2 grid",
|
||||||
|
conversation_id=result["conversation_id"]
|
||||||
|
)
|
||||||
|
download(result2, path="directions.png")
|
||||||
|
|
||||||
|
# Video
|
||||||
|
video = generate(
|
||||||
|
"Walk cycle animation of this knight",
|
||||||
|
type="video",
|
||||||
|
conversation_id=result["conversation_id"],
|
||||||
|
timeout=300
|
||||||
|
)
|
||||||
|
download(video, path="walk.mp4")
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user