feat: add phantom-canvas skill for Gemini image generation
This commit is contained in:
215
skills/phantom-canvas/SKILL.md
Normal file
215
skills/phantom-canvas/SKILL.md
Normal file
@@ -0,0 +1,215 @@
|
||||
---
|
||||
name: phantom-canvas
|
||||
description: CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more.
|
||||
allowed-tools: Bash
|
||||
---
|
||||
|
||||
# Phantom Canvas
|
||||
|
||||
**CLI + HTTP API** for image and video generation through **Gemini Web**. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI.
|
||||
|
||||
Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics.
|
||||
|
||||

|
||||
|
||||
## Quick Check
|
||||
|
||||
```bash
|
||||
# Verify installation
|
||||
phantom-canvas --help
|
||||
```
|
||||
|
||||
If you see the logo and command list, it's installed. If not:
|
||||
|
||||
```bash
|
||||
npm install -g phantom-canvas
|
||||
```
|
||||
|
||||
## First-Time Setup (Required Once)
|
||||
|
||||
The first time, Chrome needs to open visibly so you can log into your Google account:
|
||||
|
||||
```bash
|
||||
# Open Chrome — a window appears. Log into Google in that window.
|
||||
phantom-canvas chrome
|
||||
|
||||
# Or just generate with --headed:
|
||||
phantom-canvas generate "test image" --headed
|
||||
```
|
||||
|
||||
After logging in once, the session persists at `~/.phantom-canvas/chrome-profile/`. All future runs can be headless.
|
||||
|
||||
**When you see "Session expired"**, tell the user to run `phantom-canvas chrome` and re-login. Do NOT automate login — it requires human interaction with Google auth.
|
||||
|
||||
## CLI Generate (for agents and scripts)
|
||||
|
||||
Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px).
|
||||
|
||||
### Text-to-Image
|
||||
|
||||
```bash
|
||||
# Simple generate
|
||||
phantom-canvas generate "your prompt here" -o output.png
|
||||
```
|
||||
|
||||
Returns JSON:
|
||||
```json
|
||||
{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"}
|
||||
```
|
||||
|
||||
### Image-to-Image (Reference Upload)
|
||||
|
||||
Use an existing image as visual reference:
|
||||
|
||||
```bash
|
||||
phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png
|
||||
```
|
||||
|
||||
Reference path must be absolute or relative to cwd.
|
||||
|
||||
### Multi-Turn (Iterative Design)
|
||||
|
||||
Continue in the same Gemini conversation to refine:
|
||||
|
||||
```bash
|
||||
# Round 1
|
||||
RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png)
|
||||
CONV=$(echo $RESULT | jq -r .conversation_id)
|
||||
|
||||
# Round 2 — Gemini remembers the character
|
||||
phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png
|
||||
|
||||
# Round 3
|
||||
phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png
|
||||
```
|
||||
|
||||
### Video Generation
|
||||
|
||||
```bash
|
||||
phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4
|
||||
```
|
||||
|
||||
Video takes 1-2 minutes. Gemini has daily video quotas.
|
||||
|
||||
### CLI Options
|
||||
|
||||
| Flag | Description |
|
||||
|---|---|
|
||||
| `-o, --output <file>` | Output file path |
|
||||
| `--ref <file>` | Reference image (absolute path) |
|
||||
| `--video` | Generate video instead of image |
|
||||
| `--conversation <id>` | Continue previous conversation |
|
||||
| `--timeout <secs>` | Timeout (default: 180 image, 300 video) |
|
||||
| `--headed` | Show browser window (default: headless) |
|
||||
| `--cdp <url>` | Chrome DevTools URL (default: http://127.0.0.1:9222) |
|
||||
|
||||
## HTTP API (Server Mode)
|
||||
|
||||
For programmatic access from apps, pipelines, or webhooks:
|
||||
|
||||
```bash
|
||||
# Start the server
|
||||
phantom-canvas serve [--port 8420]
|
||||
```
|
||||
|
||||
### Generate an Image
|
||||
|
||||
```bash
|
||||
curl -X POST localhost:8420/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}'
|
||||
```
|
||||
|
||||
Returns immediately with a task ID:
|
||||
```json
|
||||
{"task_id": "abc123", "status": "queued"}
|
||||
```
|
||||
|
||||
### Check Task Status
|
||||
|
||||
```bash
|
||||
curl localhost:8420/task/abc123
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"task_id": "abc123",
|
||||
"status": "completed",
|
||||
"images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}],
|
||||
"conversation_id": "05695dfd143c4dad",
|
||||
"elapsed_secs": 45.3
|
||||
}
|
||||
```
|
||||
|
||||
### Download Result
|
||||
|
||||
```bash
|
||||
curl localhost:8420/task/abc123/image/0 -o result.png
|
||||
```
|
||||
|
||||
### With Reference Image
|
||||
|
||||
```bash
|
||||
curl -X POST localhost:8420/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"prompt": "same character from 4 angles in a 2x2 grid",
|
||||
"reference_images": ["/absolute/path/to/sprite.png"]
|
||||
}'
|
||||
```
|
||||
|
||||
> `reference_images` must be **absolute local file paths**.
|
||||
|
||||
### Check Server Health
|
||||
|
||||
```bash
|
||||
curl localhost:8420/health
|
||||
```
|
||||
|
||||
### API Parameters (POST /generate)
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `prompt` | string | required | Generation prompt |
|
||||
| `type` | `"image"` \| `"video"` | `"image"` | Output type |
|
||||
| `reference_images` | string[] | — | Absolute paths to reference images |
|
||||
| `num_images` | number | 1 | How many images to generate |
|
||||
| `timeout_secs` | number | 180 / 300 | Timeout (image / video) |
|
||||
| `callback_url` | string | — | Webhook URL for async completion |
|
||||
| `conversation_id` | string | — | Continue existing conversation |
|
||||
|
||||
## Best Practices for Website Images
|
||||
|
||||
### Use #00FF00 Green Background
|
||||
|
||||
Instead of asking for "transparent background", use:
|
||||
|
||||
```
|
||||
"pixel art knight, isometric view, solid green #00FF00 chroma-key background"
|
||||
```
|
||||
|
||||
Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing.
|
||||
|
||||
### Iterative Workflow
|
||||
|
||||
For website hero images or banners, use multi-turn:
|
||||
|
||||
1. First generate a base concept
|
||||
2. Use `--conversation` to refine colors, layout, text
|
||||
3. Download the final version
|
||||
|
||||
### Prompt Tips
|
||||
|
||||
- Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art"
|
||||
- Specify dimensions and composition when possible
|
||||
- For website images, mention the context: "hero banner for a tech startup website"
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Action |
|
||||
|---|---|
|
||||
| Chrome failed to start | Install Chrome or set Chrome path |
|
||||
| "Session expired" | Run `phantom-canvas chrome`, re-login in the browser window |
|
||||
| Timeout / empty images | Retry with different prompt or longer `--timeout` |
|
||||
| Video quota exceeded | Wait until tomorrow (Gemini daily limit) |
|
||||
| Port already in use | Use `--port 8430` or different port |
|
||||
BIN
skills/phantom-canvas/logo.png
Normal file
BIN
skills/phantom-canvas/logo.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 379 KiB |
28
skills/phantom-canvas/references/AGENTS.md
Normal file
28
skills/phantom-canvas/references/AGENTS.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# AGENTS.md
|
||||
|
||||
CLI tool and HTTP API for AI image/video generation via Gemini Web.
|
||||
|
||||
## What this tool does
|
||||
|
||||
Phantom Canvas wraps Gemini Web as a programmable CLI and HTTP API. It launches Chrome via CDP, automates Gemini's web UI, and exposes generation capabilities for AI agents, scripts, and applications.
|
||||
|
||||
## How to use
|
||||
|
||||
```bash
|
||||
bun add -g phantom-canvas # or: npm install -g phantom-canvas
|
||||
phantom-canvas generate "your prompt" -o output.png --headed # first time: login in Chrome
|
||||
phantom-canvas generate "your prompt" -o output.png # after that: headless
|
||||
```
|
||||
|
||||
See [SKILL.md](SKILL.md) for complete agent instructions.
|
||||
|
||||
## Architecture
|
||||
|
||||
- `index.ts` — CLI entry point (chrome / generate / serve)
|
||||
- `lib/browser.ts` — Browser automation (Chrome CDP + Playwright)
|
||||
- `lib/tasks.ts` — Async task queue
|
||||
- `dist/index.js` — Compiled Node.js bundle
|
||||
|
||||
## Session
|
||||
|
||||
Chrome stores login in `~/.phantom-canvas/chrome-profile/`. First time requires `--headed` to login interactively. After that, login persists and headless mode works.
|
||||
BIN
skills/phantom-canvas/references/diagram.png
Normal file
BIN
skills/phantom-canvas/references/diagram.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 5.9 MiB |
307
skills/phantom-canvas/references/examples.md
Normal file
307
skills/phantom-canvas/references/examples.md
Normal file
@@ -0,0 +1,307 @@
|
||||
# Phantom Canvas — Usage Examples
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
bun install -g github:baixianger/phantom-canvas
|
||||
phantom-canvas login # first time: login to Google
|
||||
phantom-canvas # start server on :8420
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. Text-to-Image
|
||||
|
||||
Generate an image from a text prompt. Each request starts a new Gemini conversation.
|
||||
|
||||
```bash
|
||||
# Submit
|
||||
curl -X POST localhost:8420/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"prompt": "Isometric pixel art knight with sword and shield, Final Fantasy Tactics style, on solid green #00FF00 chroma-key background, standing idle pose"
|
||||
}'
|
||||
# => {"task_id": "abc123", "status": "queued"}
|
||||
|
||||
# Poll status
|
||||
curl localhost:8420/task/abc123
|
||||
# => {"status": "completed", "conversation_id": "05695dfd143c4dad", "images": [...]}
|
||||
|
||||
# Download image
|
||||
curl localhost:8420/task/abc123/image/0 -o knight.png
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Image-to-Image (Reference Upload)
|
||||
|
||||
Upload a reference image so Gemini keeps the same character design.
|
||||
|
||||
```bash
|
||||
# Generate anchor sprite first
|
||||
curl -X POST localhost:8420/generate \
|
||||
-d '{"prompt": "SE-facing isometric pixel art pirate, red bandana, blue tunic, FFT style, #00FF00 bg"}'
|
||||
# Wait, download → pirate.png
|
||||
|
||||
# Use it as reference for 4-direction sheet
|
||||
curl -X POST localhost:8420/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"prompt": "Using the uploaded character, create a 2x2 sprite sheet: top-left=North (back), top-right=East (right side), bottom-left=South (front), bottom-right=SE (same as ref). Same pixel art style, same green #00FF00 background.",
|
||||
"reference_images": ["/absolute/path/to/pirate.png"]
|
||||
}'
|
||||
```
|
||||
|
||||
> `reference_images` must be **absolute local file paths**. The browser uploads them through Gemini's file upload UI.
|
||||
|
||||
---
|
||||
|
||||
## 3. Multi-Turn Conversation
|
||||
|
||||
Continue in the same Gemini chat to iterate on a design. Pass `conversation_id` from a previous task.
|
||||
|
||||
```bash
|
||||
# Step 1: initial generation
|
||||
curl -X POST localhost:8420/generate \
|
||||
-d '{"prompt": "Pixel art knight character, isometric, green background"}'
|
||||
# => {"task_id": "aaa", "status": "queued"}
|
||||
|
||||
# Get conversation_id from result
|
||||
curl localhost:8420/task/aaa
|
||||
# => {"conversation_id": "05695dfd143c4dad", "images": [...]}
|
||||
|
||||
# Step 2: refine in same conversation — Gemini remembers context
|
||||
curl -X POST localhost:8420/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"prompt": "Now make the sword larger and add a red cape",
|
||||
"conversation_id": "05695dfd143c4dad"
|
||||
}'
|
||||
|
||||
# Step 3: generate variations
|
||||
curl -X POST localhost:8420/generate \
|
||||
-d '{
|
||||
"prompt": "Show this character from 4 different angles in a 2x2 grid",
|
||||
"conversation_id": "05695dfd143c4dad"
|
||||
}'
|
||||
```
|
||||
|
||||
> Multi-turn is useful for iterative design. Gemini keeps the visual context from previous messages.
|
||||
|
||||
---
|
||||
|
||||
## 4. Video Generation
|
||||
|
||||
Generate walk cycle animations. Takes 1-2 minutes. Gemini has daily video quotas.
|
||||
|
||||
```bash
|
||||
curl -X POST localhost:8420/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"prompt": "Short looping video of a pixel art knight walking in place, isometric view, Final Fantasy Tactics style",
|
||||
"type": "video",
|
||||
"timeout_secs": 300
|
||||
}'
|
||||
```
|
||||
|
||||
With reference image:
|
||||
|
||||
```bash
|
||||
curl -X POST localhost:8420/generate \
|
||||
-d '{
|
||||
"prompt": "Looping walk cycle animation of this exact character",
|
||||
"reference_images": ["/path/to/knight.png"],
|
||||
"type": "video",
|
||||
"timeout_secs": 300
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Webhook Callback
|
||||
|
||||
Get notified when generation completes instead of polling.
|
||||
|
||||
```bash
|
||||
curl -X POST localhost:8420/generate \
|
||||
-d '{
|
||||
"prompt": "pixel art mage with staff, isometric, green bg",
|
||||
"callback_url": "http://localhost:3000/webhook"
|
||||
}'
|
||||
```
|
||||
|
||||
Your webhook receives:
|
||||
|
||||
```json
|
||||
{
|
||||
"task_id": "abc123",
|
||||
"status": "completed",
|
||||
"images": [{"index": 0, "url": "/task/abc123/image/0"}]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Full Pipeline — Game Asset Turnaround
|
||||
|
||||
Complete workflow for generating an 8-way isometric sprite sheet:
|
||||
|
||||
```bash
|
||||
API=http://localhost:8420
|
||||
OUT=./sprites
|
||||
|
||||
# Stage 1: Anchor sprite
|
||||
TASK=$(curl -s -X POST $API/generate -d '{
|
||||
"prompt": "Single SE-facing isometric pixel art knight, dark armor, red cape, sword and shield, FFT style, solid #00FF00 green background, no shadow"
|
||||
}' | jq -r .task_id)
|
||||
|
||||
echo "Stage 1: $TASK"
|
||||
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
|
||||
curl -s $API/task/$TASK/image/0 -o $OUT/anchor.png
|
||||
CONV=$(curl -s $API/task/$TASK | jq -r .conversation_id)
|
||||
echo "Anchor saved. Conversation: $CONV"
|
||||
|
||||
# Stage 2: Cardinal facings (multi-turn, Gemini remembers the knight)
|
||||
TASK=$(curl -s -X POST $API/generate -d "{
|
||||
\"prompt\": \"Now create a 2x2 sprite sheet of this SAME knight from 4 angles: top-left=North (back view), top-right=East (right side), bottom-left=South (front view), bottom-right=SE (same as before). Same style, same green background.\",
|
||||
\"conversation_id\": \"$CONV\"
|
||||
}" | jq -r .task_id)
|
||||
|
||||
echo "Stage 2: $TASK"
|
||||
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
|
||||
curl -s $API/task/$TASK/image/0 -o $OUT/cardinals.png
|
||||
|
||||
# Stage 3: Diagonal facings
|
||||
TASK=$(curl -s -X POST $API/generate -d "{
|
||||
\"prompt\": \"Now create 4 diagonal views in a 2x2 grid: NW (mostly back + left side), NE (mostly back + right side), SW (mostly front + left side), SE (mostly front + right side). Same character, same style.\",
|
||||
\"conversation_id\": \"$CONV\"
|
||||
}" | jq -r .task_id)
|
||||
|
||||
echo "Stage 3: $TASK"
|
||||
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
|
||||
curl -s $API/task/$TASK/image/0 -o $OUT/diagonals.png
|
||||
|
||||
# Stage 4: Assembly (local code, no API needed)
|
||||
# python3 assemble.py $OUT/cardinals.png $OUT/diagonals.png $OUT/turnaround.png
|
||||
|
||||
# Stage 5: Walk animation
|
||||
TASK=$(curl -s -X POST $API/generate -d "{
|
||||
\"prompt\": \"Create a short looping walk cycle video of this knight, isometric SW-facing, walking in place\",
|
||||
\"conversation_id\": \"$CONV\",
|
||||
\"type\": \"video\",
|
||||
\"timeout_secs\": 300
|
||||
}" | jq -r .task_id)
|
||||
|
||||
echo "Stage 5: $TASK"
|
||||
while [ "$(curl -s $API/task/$TASK | jq -r .status)" != "completed" ]; do sleep 15; done
|
||||
curl -s $API/task/$TASK/image/0 -o $OUT/walk.mp4
|
||||
|
||||
echo "Done! Files in $OUT/"
|
||||
ls -la $OUT/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. TypeScript/Bun Client
|
||||
|
||||
```typescript
|
||||
const API = "http://localhost:8420";
|
||||
|
||||
async function generate(opts: {
|
||||
prompt: string;
|
||||
type?: "image" | "video";
|
||||
referenceImages?: string[];
|
||||
conversationId?: string;
|
||||
}) {
|
||||
// Submit task
|
||||
const { task_id } = await fetch(`${API}/generate`, {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({
|
||||
prompt: opts.prompt,
|
||||
type: opts.type ?? "image",
|
||||
reference_images: opts.referenceImages,
|
||||
conversation_id: opts.conversationId,
|
||||
}),
|
||||
}).then((r) => r.json());
|
||||
|
||||
// Poll until done
|
||||
while (true) {
|
||||
const task = await fetch(`${API}/task/${task_id}`).then((r) => r.json());
|
||||
if (task.status === "completed") return task;
|
||||
if (task.status === "failed") throw new Error(task.error);
|
||||
await Bun.sleep(5000);
|
||||
}
|
||||
}
|
||||
|
||||
// Usage
|
||||
const anchor = await generate({
|
||||
prompt: "Isometric pixel art knight, FFT style, green #00FF00 bg",
|
||||
});
|
||||
console.log("Anchor:", anchor.images[0].url);
|
||||
|
||||
// Multi-turn: iterate on the same character
|
||||
const refined = await generate({
|
||||
prompt: "Make the sword bigger and add a glowing effect",
|
||||
conversationId: anchor.conversation_id,
|
||||
});
|
||||
|
||||
// Save image
|
||||
const img = await fetch(`${API}${refined.images[0].url}`);
|
||||
await Bun.write("knight.png", img);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Python Client
|
||||
|
||||
```python
|
||||
import requests, time
|
||||
|
||||
API = "http://localhost:8420"
|
||||
|
||||
def generate(prompt, type="image", reference_images=None, conversation_id=None, timeout=180):
|
||||
"""Submit generation task and wait for result."""
|
||||
resp = requests.post(f"{API}/generate", json={
|
||||
"prompt": prompt,
|
||||
"type": type,
|
||||
"reference_images": reference_images,
|
||||
"conversation_id": conversation_id,
|
||||
"timeout_secs": timeout,
|
||||
})
|
||||
task_id = resp.json()["task_id"]
|
||||
|
||||
while True:
|
||||
task = requests.get(f"{API}/task/{task_id}").json()
|
||||
if task["status"] == "completed":
|
||||
return task
|
||||
if task["status"] == "failed":
|
||||
raise RuntimeError(task["error"])
|
||||
time.sleep(5)
|
||||
|
||||
def download(task, index=0, path="output.png"):
|
||||
"""Download generated file."""
|
||||
url = f"{API}{task['images'][index]['url']}"
|
||||
with open(path, "wb") as f:
|
||||
f.write(requests.get(url).content)
|
||||
|
||||
# Text-to-image
|
||||
result = generate("pixel art knight, isometric, green bg")
|
||||
download(result, path="knight.png")
|
||||
|
||||
# Multi-turn
|
||||
result2 = generate(
|
||||
"Now show 4 directions in a 2x2 grid",
|
||||
conversation_id=result["conversation_id"]
|
||||
)
|
||||
download(result2, path="directions.png")
|
||||
|
||||
# Video
|
||||
video = generate(
|
||||
"Walk cycle animation of this knight",
|
||||
type="video",
|
||||
conversation_id=result["conversation_id"],
|
||||
timeout=300
|
||||
)
|
||||
download(video, path="walk.mp4")
|
||||
```
|
||||
Reference in New Issue
Block a user