feat: add phantom-canvas skill for Gemini image generation

2026-05-26 12:43:48 +07:00
parent b1bb6cbedc
commit a7477db220
5 changed files with 550 additions and 0 deletions
--- a/skills/phantom-canvas/SKILL.md
+++ b/skills/phantom-canvas/SKILL.md
@@ -0,0 +1,215 @@
+---
+name: phantom-canvas
+description: CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more.
+allowed-tools: Bash
+---
+
+# Phantom Canvas
+
+**CLI + HTTP API** for image and video generation through **Gemini Web**. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI.
+
+Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics.
+
+![Phantom Canvas logo](logo.png)
+
+## Quick Check
+
+```bash
+# Verify installation
+phantom-canvas --help
+```
+
+If you see the logo and command list, it's installed. If not:
+
+```bash
+npm install -g phantom-canvas
+```
+
+## First-Time Setup (Required Once)
+
+The first time, Chrome needs to open visibly so you can log into your Google account:
+
+```bash
+# Open Chrome — a window appears. Log into Google in that window.
+phantom-canvas chrome
+
+# Or just generate with --headed:
+phantom-canvas generate "test image" --headed
+```
+
+After logging in once, the session persists at `~/.phantom-canvas/chrome-profile/`. All future runs can be headless.
+
+**When you see "Session expired"**, tell the user to run `phantom-canvas chrome` and re-login. Do NOT automate login — it requires human interaction with Google auth.
+
+## CLI Generate (for agents and scripts)
+
+Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px).
+
+### Text-to-Image
+
+```bash
+# Simple generate
+phantom-canvas generate "your prompt here" -o output.png
+```
+
+Returns JSON:
+```json
+{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"}
+```
+
+### Image-to-Image (Reference Upload)
+
+Use an existing image as visual reference:
+
+```bash
+phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png
+```
+
+Reference path must be absolute or relative to cwd.
+
+### Multi-Turn (Iterative Design)
+
+Continue in the same Gemini conversation to refine:
+
+```bash
+# Round 1
+RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png)
+CONV=$(echo $RESULT | jq -r .conversation_id)
+
+# Round 2 — Gemini remembers the character
+phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png
+
+# Round 3
+phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png
+```
+
+### Video Generation
+
+```bash
+phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4
+```
+
+Video takes 1-2 minutes. Gemini has daily video quotas.
+
+### CLI Options
+
+| Flag | Description |
+|---|---|
+| `-o, --output <file>` | Output file path |
+| `--ref <file>` | Reference image (absolute path) |
+| `--video` | Generate video instead of image |
+| `--conversation <id>` | Continue previous conversation |
+| `--timeout <secs>` | Timeout (default: 180 image, 300 video) |
+| `--headed` | Show browser window (default: headless) |
+| `--cdp <url>` | Chrome DevTools URL (default: http://127.0.0.1:9222) |
+
+## HTTP API (Server Mode)
+
+For programmatic access from apps, pipelines, or webhooks:
+
+```bash
+# Start the server
+phantom-canvas serve [--port 8420]
+```
+
+### Generate an Image
+
+```bash
+curl -X POST localhost:8420/generate \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}'
+```
+
+Returns immediately with a task ID:
+```json
+{"task_id": "abc123", "status": "queued"}
+```
+
+### Check Task Status
+
+```bash
+curl localhost:8420/task/abc123
+```
+
+```json
+{
+  "task_id": "abc123",
+  "status": "completed",
+  "images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}],
+  "conversation_id": "05695dfd143c4dad",
+  "elapsed_secs": 45.3
+}
+```
+
+### Download Result
+
+```bash
+curl localhost:8420/task/abc123/image/0 -o result.png
+```
+
+### With Reference Image
+
+```bash
+curl -X POST localhost:8420/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "same character from 4 angles in a 2x2 grid",
+    "reference_images": ["/absolute/path/to/sprite.png"]
+  }'
+```
+
+> `reference_images` must be **absolute local file paths**.
+
+### Check Server Health
+
+```bash
+curl localhost:8420/health
+```
+
+### API Parameters (POST /generate)
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `prompt` | string | required | Generation prompt |
+| `type` | `"image"` \| `"video"` | `"image"` | Output type |
+| `reference_images` | string[] | — | Absolute paths to reference images |
+| `num_images` | number | 1 | How many images to generate |
+| `timeout_secs` | number | 180 / 300 | Timeout (image / video) |
+| `callback_url` | string | — | Webhook URL for async completion |
+| `conversation_id` | string | — | Continue existing conversation |
+
+## Best Practices for Website Images
+
+### Use #00FF00 Green Background
+
+Instead of asking for "transparent background", use:
+
+```
+"pixel art knight, isometric view, solid green #00FF00 chroma-key background"
+```
+
+Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing.
+
+### Iterative Workflow
+
+For website hero images or banners, use multi-turn:
+
+1. First generate a base concept
+2. Use `--conversation` to refine colors, layout, text
+3. Download the final version
+
+### Prompt Tips
+
+- Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art"
+- Specify dimensions and composition when possible
+- For website images, mention the context: "hero banner for a tech startup website"
+
+## Error Handling
+
+| Error | Action |
+|---|---|
+| Chrome failed to start | Install Chrome or set Chrome path |
+| "Session expired" | Run `phantom-canvas chrome`, re-login in the browser window |
+| Timeout / empty images | Retry with different prompt or longer `--timeout` |
+| Video quota exceeded | Wait until tomorrow (Gemini daily limit) |
+| Port already in use | Use `--port 8430` or different port |
--- a/skills/phantom-canvas/logo.png
+++ b/skills/phantom-canvas/logo.png
--- a/skills/phantom-canvas/references/AGENTS.md
+++ b/skills/phantom-canvas/references/AGENTS.md
@@ -0,0 +1,28 @@
+# AGENTS.md
+
+CLI tool and HTTP API for AI image/video generation via Gemini Web.
+
+## What this tool does
+
+Phantom Canvas wraps Gemini Web as a programmable CLI and HTTP API. It launches Chrome via CDP, automates Gemini's web UI, and exposes generation capabilities for AI agents, scripts, and applications.
+
+## How to use
+
+```bash
+bun add -g phantom-canvas       # or: npm install -g phantom-canvas
+phantom-canvas generate "your prompt" -o output.png --headed  # first time: login in Chrome
+phantom-canvas generate "your prompt" -o output.png           # after that: headless
+```
+
+See [SKILL.md](SKILL.md) for complete agent instructions.
+
+## Architecture
+
+- `index.ts` — CLI entry point (chrome / generate / serve)
+- `lib/browser.ts` — Browser automation (Chrome CDP + Playwright)
+- `lib/tasks.ts` — Async task queue
+- `dist/index.js` — Compiled Node.js bundle
+
+## Session
+
+Chrome stores login in `~/.phantom-canvas/chrome-profile/`. First time requires `--headed` to login interactively. After that, login persists and headless mode works.
--- a/skills/phantom-canvas/references/diagram.png
+++ b/skills/phantom-canvas/references/diagram.png
--- a/skills/phantom-canvas/references/examples.md
+++ b/skills/phantom-canvas/references/examples.md
@@ -0,0 +1,307 @@
+# Phantom Canvas — Usage Examples
+
+## Setup
+
+```bash
+bun install -g github:baixianger/phantom-canvas
+phantom-canvas login   # first time: login to Google
+phantom-canvas         # start server on :8420
+```
+
+---
+
+## 1. Text-to-Image
+
+Generate an image from a text prompt. Each request starts a new Gemini conversation.
+
+```bash
+# Submit
+curl -X POST localhost:8420/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "Isometric pixel art knight with sword and shield, Final Fantasy Tactics style, on solid green #00FF00 chroma-key background, standing idle pose"
+  }'
+# => {"task_id": "abc123", "status": "queued"}
+
+# Poll status
+curl localhost:8420/task/abc123
+# => {"status": "completed", "conversation_id": "05695dfd143c4dad", "images": [...]}
+
+# Download image
+curl localhost:8420/task/abc123/image/0 -o knight.png
+```
+
+---
+
+## 2. Image-to-Image (Reference Upload)
+
+Upload a reference image so Gemini keeps the same character design.
+
+```bash
+# Generate anchor sprite first
+curl -X POST localhost:8420/generate \
+  -d '{"prompt": "SE-facing isometric pixel art pirate, red bandana, blue tunic, FFT style, #00FF00 bg"}'
+# Wait, download → pirate.png
+
+# Use it as reference for 4-direction sheet
+curl -X POST localhost:8420/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "Using the uploaded character, create a 2x2 sprite sheet: top-left=North (back), top-right=East (right side), bottom-left=South (front), bottom-right=SE (same as ref). Same pixel art style, same green #00FF00 background.",
+    "reference_images": ["/absolute/path/to/pirate.png"]
+  }'
+```
+
+> `reference_images` must be **absolute local file paths**. The browser uploads them through Gemini's file upload UI.
+
+---
+
+## 3. Multi-Turn Conversation
+
+Continue in the same Gemini chat to iterate on a design. Pass `conversation_id` from a previous task.
+
+```bash
+# Step 1: initial generation
+curl -X POST localhost:8420/generate \
+  -d '{"prompt": "Pixel art knight character, isometric, green background"}'
+# => {"task_id": "aaa", "status": "queued"}
+
+# Get conversation_id from result
+curl localhost:8420/task/aaa
+# => {"conversation_id": "05695dfd143c4dad", "images": [...]}
+
+# Step 2: refine in same conversation — Gemini remembers context
+curl -X POST localhost:8420/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "Now make the sword larger and add a red cape",
+    "conversation_id": "05695dfd143c4dad"
+  }'
+
+# Step 3: generate variations
+curl -X POST localhost:8420/generate \
+  -d '{
+    "prompt": "Show this character from 4 different angles in a 2x2 grid",
+    "conversation_id": "05695dfd143c4dad"
+  }'
+```
+
+> Multi-turn is useful for iterative design. Gemini keeps the visual context from previous messages.
+
+---
+
+## 4. Video Generation
+
+Generate walk cycle animations. Takes 1-2 minutes. Gemini has daily video quotas.
+
+```bash
+curl -X POST localhost:8420/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "Short looping video of a pixel art knight walking in place, isometric view, Final Fantasy Tactics style",
+    "type": "video",
+    "timeout_secs": 300
+  }'
+```
+
+With reference image:
+
+```bash
+curl -X POST localhost:8420/generate \
+  -d '{
+    "prompt": "Looping walk cycle animation of this exact character",
+    "reference_images": ["/path/to/knight.png"],
+    "type": "video",
+    "timeout_secs": 300
+  }'
+```
+
+---
+
+## 5. Webhook Callback
+
+Get notified when generation completes instead of polling.
+
+```bash
+curl -X POST localhost:8420/generate \
+  -d '{
+    "prompt": "pixel art mage with staff, isometric, green bg",
+    "callback_url": "http://localhost:3000/webhook"
+  }'
+```
+
+Your webhook receives:
+
+```json
+{
+  "task_id": "abc123",
+  "status": "completed",
+  "images": [{"index": 0, "url": "/task/abc123/image/0"}]
+}
+```
+
+---
+
+## 6. Full Pipeline — Game Asset Turnaround
+
+Complete workflow for generating an 8-way isometric sprite sheet:
+
+```bash
+API=http://localhost:8420
+OUT=./sprites
+
+# Stage 1: Anchor sprite
+TASK=$(curl -s -X POST $API/generate -d '{
+  "prompt": "Single SE-facing isometric pixel art knight, dark armor, red cape, sword and shield, FFT style, solid #00FF00 green background, no shadow"
+}' | jq -r .task_id)
+
+echo "Stage 1: $TASK"
+while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
+curl -s $API/task/$TASK/image/0 -o $OUT/anchor.png
+CONV=$(curl -s $API/task/$TASK | jq -r .conversation_id)
+echo "Anchor saved. Conversation: $CONV"
+
+# Stage 2: Cardinal facings (multi-turn, Gemini remembers the knight)
+TASK=$(curl -s -X POST $API/generate -d "{
+  \"prompt\": \"Now create a 2x2 sprite sheet of this SAME knight from 4 angles: top-left=North (back view), top-right=East (right side), bottom-left=South (front view), bottom-right=SE (same as before). Same style, same green background.\",
+  \"conversation_id\": \"$CONV\"
+}" | jq -r .task_id)
+
+echo "Stage 2: $TASK"
+while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
+curl -s $API/task/$TASK/image/0 -o $OUT/cardinals.png
+
+# Stage 3: Diagonal facings
+TASK=$(curl -s -X POST $API/generate -d "{
+  \"prompt\": \"Now create 4 diagonal views in a 2x2 grid: NW (mostly back + left side), NE (mostly back + right side), SW (mostly front + left side), SE (mostly front + right side). Same character, same style.\",
+  \"conversation_id\": \"$CONV\"
+}" | jq -r .task_id)
+
+echo "Stage 3: $TASK"
+while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done
+curl -s $API/task/$TASK/image/0 -o $OUT/diagonals.png
+
+# Stage 4: Assembly (local code, no API needed)
+# python3 assemble.py $OUT/cardinals.png $OUT/diagonals.png $OUT/turnaround.png
+
+# Stage 5: Walk animation
+TASK=$(curl -s -X POST $API/generate -d "{
+  \"prompt\": \"Create a short looping walk cycle video of this knight, isometric SW-facing, walking in place\",
+  \"conversation_id\": \"$CONV\",
+  \"type\": \"video\",
+  \"timeout_secs\": 300
+}" | jq -r .task_id)
+
+echo "Stage 5: $TASK"
+while [ "$(curl -s $API/task/$TASK | jq -r .status)" != "completed" ]; do sleep 15; done
+curl -s $API/task/$TASK/image/0 -o $OUT/walk.mp4
+
+echo "Done! Files in $OUT/"
+ls -la $OUT/
+```
+
+---
+
+## 7. TypeScript/Bun Client
+
+```typescript
+const API = "http://localhost:8420";
+
+async function generate(opts: {
+  prompt: string;
+  type?: "image" | "video";
+  referenceImages?: string[];
+  conversationId?: string;
+}) {
+  // Submit task
+  const { task_id } = await fetch(`${API}/generate`, {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify({
+      prompt: opts.prompt,
+      type: opts.type ?? "image",
+      reference_images: opts.referenceImages,
+      conversation_id: opts.conversationId,
+    }),
+  }).then((r) => r.json());
+
+  // Poll until done
+  while (true) {
+    const task = await fetch(`${API}/task/${task_id}`).then((r) => r.json());
+    if (task.status === "completed") return task;
+    if (task.status === "failed") throw new Error(task.error);
+    await Bun.sleep(5000);
+  }
+}
+
+// Usage
+const anchor = await generate({
+  prompt: "Isometric pixel art knight, FFT style, green #00FF00 bg",
+});
+console.log("Anchor:", anchor.images[0].url);
+
+// Multi-turn: iterate on the same character
+const refined = await generate({
+  prompt: "Make the sword bigger and add a glowing effect",
+  conversationId: anchor.conversation_id,
+});
+
+// Save image
+const img = await fetch(`${API}${refined.images[0].url}`);
+await Bun.write("knight.png", img);
+```
+
+---
+
+## 8. Python Client
+
+```python
+import requests, time
+
+API = "http://localhost:8420"
+
+def generate(prompt, type="image", reference_images=None, conversation_id=None, timeout=180):
+    """Submit generation task and wait for result."""
+    resp = requests.post(f"{API}/generate", json={
+        "prompt": prompt,
+        "type": type,
+        "reference_images": reference_images,
+        "conversation_id": conversation_id,
+        "timeout_secs": timeout,
+    })
+    task_id = resp.json()["task_id"]
+
+    while True:
+        task = requests.get(f"{API}/task/{task_id}").json()
+        if task["status"] == "completed":
+            return task
+        if task["status"] == "failed":
+            raise RuntimeError(task["error"])
+        time.sleep(5)
+
+def download(task, index=0, path="output.png"):
+    """Download generated file."""
+    url = f"{API}{task['images'][index]['url']}"
+    with open(path, "wb") as f:
+        f.write(requests.get(url).content)
+
+# Text-to-image
+result = generate("pixel art knight, isometric, green bg")
+download(result, path="knight.png")
+
+# Multi-turn
+result2 = generate(
+    "Now show 4 directions in a 2x2 grid",
+    conversation_id=result["conversation_id"]
+)
+download(result2, path="directions.png")
+
+# Video
+video = generate(
+    "Walk cycle animation of this knight",
+    type="video",
+    conversation_id=result["conversation_id"],
+    timeout=300
+)
+download(video, path="walk.mp4")
+```