feat: add phantom-canvas skill for Gemini image generation

This commit is contained in:
Kunthawat Greethong
2026-05-26 12:43:48 +07:00
parent b1bb6cbedc
commit a7477db220
5 changed files with 550 additions and 0 deletions

View File

@@ -0,0 +1,215 @@
---
name: phantom-canvas
description: CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more.
allowed-tools: Bash
---
# Phantom Canvas
**CLI + HTTP API** for image and video generation through **Gemini Web**. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI.
Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics.
![Phantom Canvas logo](logo.png)
## Quick Check
```bash
# Verify installation
phantom-canvas --help
```
If you see the logo and command list, it's installed. If not:
```bash
npm install -g phantom-canvas
```
## First-Time Setup (Required Once)
The first time, Chrome needs to open visibly so you can log into your Google account:
```bash
# Open Chrome — a window appears. Log into Google in that window.
phantom-canvas chrome
# Or just generate with --headed:
phantom-canvas generate "test image" --headed
```
After logging in once, the session persists at `~/.phantom-canvas/chrome-profile/`. All future runs can be headless.
**When you see "Session expired"**, tell the user to run `phantom-canvas chrome` and re-login. Do NOT automate login — it requires human interaction with Google auth.
## CLI Generate (for agents and scripts)
Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px).
### Text-to-Image
```bash
# Simple generate
phantom-canvas generate "your prompt here" -o output.png
```
Returns JSON:
```json
{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"}
```
### Image-to-Image (Reference Upload)
Use an existing image as visual reference:
```bash
phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png
```
Reference path must be absolute or relative to cwd.
### Multi-Turn (Iterative Design)
Continue in the same Gemini conversation to refine:
```bash
# Round 1
RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png)
CONV=$(echo $RESULT | jq -r .conversation_id)
# Round 2 — Gemini remembers the character
phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png
# Round 3
phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png
```
### Video Generation
```bash
phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4
```
Video takes 1-2 minutes. Gemini has daily video quotas.
### CLI Options
| Flag | Description |
|---|---|
| `-o, --output <file>` | Output file path |
| `--ref <file>` | Reference image (absolute path) |
| `--video` | Generate video instead of image |
| `--conversation <id>` | Continue previous conversation |
| `--timeout <secs>` | Timeout (default: 180 image, 300 video) |
| `--headed` | Show browser window (default: headless) |
| `--cdp <url>` | Chrome DevTools URL (default: http://127.0.0.1:9222) |
## HTTP API (Server Mode)
For programmatic access from apps, pipelines, or webhooks:
```bash
# Start the server
phantom-canvas serve [--port 8420]
```
### Generate an Image
```bash
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}'
```
Returns immediately with a task ID:
```json
{"task_id": "abc123", "status": "queued"}
```
### Check Task Status
```bash
curl localhost:8420/task/abc123
```
```json
{
"task_id": "abc123",
"status": "completed",
"images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}],
"conversation_id": "05695dfd143c4dad",
"elapsed_secs": 45.3
}
```
### Download Result
```bash
curl localhost:8420/task/abc123/image/0 -o result.png
```
### With Reference Image
```bash
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "same character from 4 angles in a 2x2 grid",
"reference_images": ["/absolute/path/to/sprite.png"]
}'
```
> `reference_images` must be **absolute local file paths**.
### Check Server Health
```bash
curl localhost:8420/health
```
### API Parameters (POST /generate)
| Field | Type | Default | Description |
|---|---|---|---|
| `prompt` | string | required | Generation prompt |
| `type` | `"image"` \| `"video"` | `"image"` | Output type |
| `reference_images` | string[] | — | Absolute paths to reference images |
| `num_images` | number | 1 | How many images to generate |
| `timeout_secs` | number | 180 / 300 | Timeout (image / video) |
| `callback_url` | string | — | Webhook URL for async completion |
| `conversation_id` | string | — | Continue existing conversation |
## Best Practices for Website Images
### Use #00FF00 Green Background
Instead of asking for "transparent background", use:
```
"pixel art knight, isometric view, solid green #00FF00 chroma-key background"
```
Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing.
### Iterative Workflow
For website hero images or banners, use multi-turn:
1. First generate a base concept
2. Use `--conversation` to refine colors, layout, text
3. Download the final version
### Prompt Tips
- Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art"
- Specify dimensions and composition when possible
- For website images, mention the context: "hero banner for a tech startup website"
## Error Handling
| Error | Action |
|---|---|
| Chrome failed to start | Install Chrome or set Chrome path |
| "Session expired" | Run `phantom-canvas chrome`, re-login in the browser window |
| Timeout / empty images | Retry with different prompt or longer `--timeout` |
| Video quota exceeded | Wait until tomorrow (Gemini daily limit) |
| Port already in use | Use `--port 8430` or different port |

Binary file not shown.

After

Width:  |  Height:  |  Size: 379 KiB

View File

@@ -0,0 +1,28 @@
# AGENTS.md
CLI tool and HTTP API for AI image/video generation via Gemini Web.
## What this tool does
Phantom Canvas wraps Gemini Web as a programmable CLI and HTTP API. It launches Chrome via CDP, automates Gemini's web UI, and exposes generation capabilities for AI agents, scripts, and applications.
## How to use
```bash
bun add -g phantom-canvas # or: npm install -g phantom-canvas
phantom-canvas generate "your prompt" -o output.png --headed # first time: login in Chrome
phantom-canvas generate "your prompt" -o output.png # after that: headless
```
See [SKILL.md](SKILL.md) for complete agent instructions.
## Architecture
- `index.ts` — CLI entry point (chrome / generate / serve)
- `lib/browser.ts` — Browser automation (Chrome CDP + Playwright)
- `lib/tasks.ts` — Async task queue
- `dist/index.js` — Compiled Node.js bundle
## Session
Chrome stores login in `~/.phantom-canvas/chrome-profile/`. First time requires `--headed` to login interactively. After that, login persists and headless mode works.

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 MiB

View File

@@ -0,0 +1,307 @@
# Phantom Canvas — Usage Examples
## Setup
```bash
bun install -g github:baixianger/phantom-canvas
phantom-canvas login # first time: login to Google
phantom-canvas # start server on :8420
```
---
## 1. Text-to-Image
Generate an image from a text prompt. Each request starts a new Gemini conversation.
```bash
# Submit
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Isometric pixel art knight with sword and shield, Final Fantasy Tactics style, on solid green #00FF00 chroma-key background, standing idle pose"
}'
# => {"task_id": "abc123", "status": "queued"}
# Poll status
curl localhost:8420/task/abc123
# => {"status": "completed", "conversation_id": "05695dfd143c4dad", "images": [...]}
# Download image
curl localhost:8420/task/abc123/image/0 -o knight.png
```
---
## 2. Image-to-Image (Reference Upload)
Upload a reference image so Gemini keeps the same character design.
```bash
# Generate anchor sprite first
curl -X POST localhost:8420/generate \
-d '{"prompt": "SE-facing isometric pixel art pirate, red bandana, blue tunic, FFT style, #00FF00 bg"}'
# Wait, download → pirate.png
# Use it as reference for 4-direction sheet
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Using the uploaded character, create a 2x2 sprite sheet: top-left=North (back), top-right=East (right side), bottom-left=South (front), bottom-right=SE (same as ref). Same pixel art style, same green #00FF00 background.",
"reference_images": ["/absolute/path/to/pirate.png"]
}'
```
> `reference_images` must be **absolute local file paths**. The browser uploads them through Gemini's file upload UI.
---
## 3. Multi-Turn Conversation
Continue in the same Gemini chat to iterate on a design. Pass `conversation_id` from a previous task.
```bash
# Step 1: initial generation
curl -X POST localhost:8420/generate \
-d '{"prompt": "Pixel art knight character, isometric, green background"}'
# => {"task_id": "aaa", "status": "queued"}
# Get conversation_id from result
curl localhost:8420/task/aaa
# => {"conversation_id": "05695dfd143c4dad", "images": [...]}
# Step 2: refine in same conversation — Gemini remembers context
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Now make the sword larger and add a red cape",
"conversation_id": "05695dfd143c4dad"
}'
# Step 3: generate variations
curl -X POST localhost:8420/generate \
-d '{
"prompt": "Show this character from 4 different angles in a 2x2 grid",
"conversation_id": "05695dfd143c4dad"
}'
```
> Multi-turn is useful for iterative design. Gemini keeps the visual context from previous messages.
---
## 4. Video Generation
Generate walk cycle animations. Takes 1-2 minutes. Gemini has daily video quotas.
```bash
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Short looping video of a pixel art knight walking in place, isometric view, Final Fantasy Tactics style",
"type": "video",
"timeout_secs": 300
}'
```
With reference image:
```bash
curl -X POST localhost:8420/generate \
-d '{
"prompt": "Looping walk cycle animation of this exact character",
"reference_images": ["/path/to/knight.png"],
"type": "video",
"timeout_secs": 300
}'
```
---
## 5. Webhook Callback
Get notified when generation completes instead of polling.
```bash
curl -X POST localhost:8420/generate \
-d '{
"prompt": "pixel art mage with staff, isometric, green bg",
"callback_url": "http://localhost:3000/webhook"
}'
```
Your webhook receives:
```json
{
"task_id": "abc123",
"status": "completed",
"images": [{"index": 0, "url": "/task/abc123/image/0"}]
}
```
---
## 6. Full Pipeline — Game Asset Turnaround
Complete workflow for generating an 8-way isometric sprite sheet:
```bash
API=http://localhost:8420
OUT=./sprites
# Stage 1: Anchor sprite
TASK=$(curl -s -X POST $API/generate -d '{
"prompt": "Single SE-facing isometric pixel art knight, dark armor, red cape, sword and shield, FFT style, solid #00FF00 green background, no shadow"
}' | jq -r .task_id)
echo "Stage 1: $TASK"
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
curl -s $API/task/$TASK/image/0 -o $OUT/anchor.png
CONV=$(curl -s $API/task/$TASK | jq -r .conversation_id)
echo "Anchor saved. Conversation: $CONV"
# Stage 2: Cardinal facings (multi-turn, Gemini remembers the knight)
TASK=$(curl -s -X POST $API/generate -d "{
\"prompt\": \"Now create a 2x2 sprite sheet of this SAME knight from 4 angles: top-left=North (back view), top-right=East (right side), bottom-left=South (front view), bottom-right=SE (same as before). Same style, same green background.\",
\"conversation_id\": \"$CONV\"
}" | jq -r .task_id)
echo "Stage 2: $TASK"
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
curl -s $API/task/$TASK/image/0 -o $OUT/cardinals.png
# Stage 3: Diagonal facings
TASK=$(curl -s -X POST $API/generate -d "{
\"prompt\": \"Now create 4 diagonal views in a 2x2 grid: NW (mostly back + left side), NE (mostly back + right side), SW (mostly front + left side), SE (mostly front + right side). Same character, same style.\",
\"conversation_id\": \"$CONV\"
}" | jq -r .task_id)
echo "Stage 3: $TASK"
while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
curl -s $API/task/$TASK/image/0 -o $OUT/diagonals.png
# Stage 4: Assembly (local code, no API needed)
# python3 assemble.py $OUT/cardinals.png $OUT/diagonals.png $OUT/turnaround.png
# Stage 5: Walk animation
TASK=$(curl -s -X POST $API/generate -d "{
\"prompt\": \"Create a short looping walk cycle video of this knight, isometric SW-facing, walking in place\",
\"conversation_id\": \"$CONV\",
\"type\": \"video\",
\"timeout_secs\": 300
}" | jq -r .task_id)
echo "Stage 5: $TASK"
while [ "$(curl -s $API/task/$TASK | jq -r .status)" != "completed" ]; do sleep 15; done
curl -s $API/task/$TASK/image/0 -o $OUT/walk.mp4
echo "Done! Files in $OUT/"
ls -la $OUT/
```
---
## 7. TypeScript/Bun Client
```typescript
const API = "http://localhost:8420";
async function generate(opts: {
prompt: string;
type?: "image" | "video";
referenceImages?: string[];
conversationId?: string;
}) {
// Submit task
const { task_id } = await fetch(`${API}/generate`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
prompt: opts.prompt,
type: opts.type ?? "image",
reference_images: opts.referenceImages,
conversation_id: opts.conversationId,
}),
}).then((r) => r.json());
// Poll until done
while (true) {
const task = await fetch(`${API}/task/${task_id}`).then((r) => r.json());
if (task.status === "completed") return task;
if (task.status === "failed") throw new Error(task.error);
await Bun.sleep(5000);
}
}
// Usage
const anchor = await generate({
prompt: "Isometric pixel art knight, FFT style, green #00FF00 bg",
});
console.log("Anchor:", anchor.images[0].url);
// Multi-turn: iterate on the same character
const refined = await generate({
prompt: "Make the sword bigger and add a glowing effect",
conversationId: anchor.conversation_id,
});
// Save image
const img = await fetch(`${API}${refined.images[0].url}`);
await Bun.write("knight.png", img);
```
---
## 8. Python Client
```python
import requests, time
API = "http://localhost:8420"
def generate(prompt, type="image", reference_images=None, conversation_id=None, timeout=180):
"""Submit generation task and wait for result."""
resp = requests.post(f"{API}/generate", json={
"prompt": prompt,
"type": type,
"reference_images": reference_images,
"conversation_id": conversation_id,
"timeout_secs": timeout,
})
task_id = resp.json()["task_id"]
while True:
task = requests.get(f"{API}/task/{task_id}").json()
if task["status"] == "completed":
return task
if task["status"] == "failed":
raise RuntimeError(task["error"])
time.sleep(5)
def download(task, index=0, path="output.png"):
"""Download generated file."""
url = f"{API}{task['images'][index]['url']}"
with open(path, "wb") as f:
f.write(requests.get(url).content)
# Text-to-image
result = generate("pixel art knight, isometric, green bg")
download(result, path="knight.png")
# Multi-turn
result2 = generate(
"Now show 4 directions in a 2x2 grid",
conversation_id=result["conversation_id"]
)
download(result2, path="directions.png")
# Video
video = generate(
"Walk cycle animation of this knight",
type="video",
conversation_id=result["conversation_id"],
timeout=300
)
download(video, path="walk.mp4")
```