diff --git a/skills/phantom-canvas/SKILL.md b/skills/phantom-canvas/SKILL.md new file mode 100644 index 0000000..3b87e07 --- /dev/null +++ b/skills/phantom-canvas/SKILL.md @@ -0,0 +1,215 @@ +--- +name: phantom-canvas +description: CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more. +allowed-tools: Bash +--- + +# Phantom Canvas + +**CLI + HTTP API** for image and video generation through **Gemini Web**. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI. + +Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics. + +![Phantom Canvas logo](logo.png) + +## Quick Check + +```bash +# Verify installation +phantom-canvas --help +``` + +If you see the logo and command list, it's installed. If not: + +```bash +npm install -g phantom-canvas +``` + +## First-Time Setup (Required Once) + +The first time, Chrome needs to open visibly so you can log into your Google account: + +```bash +# Open Chrome — a window appears. Log into Google in that window. +phantom-canvas chrome + +# Or just generate with --headed: +phantom-canvas generate "test image" --headed +``` + +After logging in once, the session persists at `~/.phantom-canvas/chrome-profile/`. All future runs can be headless. + +**When you see "Session expired"**, tell the user to run `phantom-canvas chrome` and re-login. Do NOT automate login — it requires human interaction with Google auth. + +## CLI Generate (for agents and scripts) + +Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px). + +### Text-to-Image + +```bash +# Simple generate +phantom-canvas generate "your prompt here" -o output.png +``` + +Returns JSON: +```json +{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"} +``` + +### Image-to-Image (Reference Upload) + +Use an existing image as visual reference: + +```bash +phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png +``` + +Reference path must be absolute or relative to cwd. + +### Multi-Turn (Iterative Design) + +Continue in the same Gemini conversation to refine: + +```bash +# Round 1 +RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png) +CONV=$(echo $RESULT | jq -r .conversation_id) + +# Round 2 — Gemini remembers the character +phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png + +# Round 3 +phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png +``` + +### Video Generation + +```bash +phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4 +``` + +Video takes 1-2 minutes. Gemini has daily video quotas. + +### CLI Options + +| Flag | Description | +|---|---| +| `-o, --output ` | Output file path | +| `--ref ` | Reference image (absolute path) | +| `--video` | Generate video instead of image | +| `--conversation ` | Continue previous conversation | +| `--timeout ` | Timeout (default: 180 image, 300 video) | +| `--headed` | Show browser window (default: headless) | +| `--cdp ` | Chrome DevTools URL (default: http://127.0.0.1:9222) | + +## HTTP API (Server Mode) + +For programmatic access from apps, pipelines, or webhooks: + +```bash +# Start the server +phantom-canvas serve [--port 8420] +``` + +### Generate an Image + +```bash +curl -X POST localhost:8420/generate \ + -H "Content-Type: application/json" \ + -d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}' +``` + +Returns immediately with a task ID: +```json +{"task_id": "abc123", "status": "queued"} +``` + +### Check Task Status + +```bash +curl localhost:8420/task/abc123 +``` + +```json +{ + "task_id": "abc123", + "status": "completed", + "images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}], + "conversation_id": "05695dfd143c4dad", + "elapsed_secs": 45.3 +} +``` + +### Download Result + +```bash +curl localhost:8420/task/abc123/image/0 -o result.png +``` + +### With Reference Image + +```bash +curl -X POST localhost:8420/generate \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "same character from 4 angles in a 2x2 grid", + "reference_images": ["/absolute/path/to/sprite.png"] + }' +``` + +> `reference_images` must be **absolute local file paths**. + +### Check Server Health + +```bash +curl localhost:8420/health +``` + +### API Parameters (POST /generate) + +| Field | Type | Default | Description | +|---|---|---|---| +| `prompt` | string | required | Generation prompt | +| `type` | `"image"` \| `"video"` | `"image"` | Output type | +| `reference_images` | string[] | — | Absolute paths to reference images | +| `num_images` | number | 1 | How many images to generate | +| `timeout_secs` | number | 180 / 300 | Timeout (image / video) | +| `callback_url` | string | — | Webhook URL for async completion | +| `conversation_id` | string | — | Continue existing conversation | + +## Best Practices for Website Images + +### Use #00FF00 Green Background + +Instead of asking for "transparent background", use: + +``` +"pixel art knight, isometric view, solid green #00FF00 chroma-key background" +``` + +Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing. + +### Iterative Workflow + +For website hero images or banners, use multi-turn: + +1. First generate a base concept +2. Use `--conversation` to refine colors, layout, text +3. Download the final version + +### Prompt Tips + +- Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art" +- Specify dimensions and composition when possible +- For website images, mention the context: "hero banner for a tech startup website" + +## Error Handling + +| Error | Action | +|---|---| +| Chrome failed to start | Install Chrome or set Chrome path | +| "Session expired" | Run `phantom-canvas chrome`, re-login in the browser window | +| Timeout / empty images | Retry with different prompt or longer `--timeout` | +| Video quota exceeded | Wait until tomorrow (Gemini daily limit) | +| Port already in use | Use `--port 8430` or different port | diff --git a/skills/phantom-canvas/logo.png b/skills/phantom-canvas/logo.png new file mode 100644 index 0000000..c3a21bd Binary files /dev/null and b/skills/phantom-canvas/logo.png differ diff --git a/skills/phantom-canvas/references/AGENTS.md b/skills/phantom-canvas/references/AGENTS.md new file mode 100644 index 0000000..f62ad90 --- /dev/null +++ b/skills/phantom-canvas/references/AGENTS.md @@ -0,0 +1,28 @@ +# AGENTS.md + +CLI tool and HTTP API for AI image/video generation via Gemini Web. + +## What this tool does + +Phantom Canvas wraps Gemini Web as a programmable CLI and HTTP API. It launches Chrome via CDP, automates Gemini's web UI, and exposes generation capabilities for AI agents, scripts, and applications. + +## How to use + +```bash +bun add -g phantom-canvas # or: npm install -g phantom-canvas +phantom-canvas generate "your prompt" -o output.png --headed # first time: login in Chrome +phantom-canvas generate "your prompt" -o output.png # after that: headless +``` + +See [SKILL.md](SKILL.md) for complete agent instructions. + +## Architecture + +- `index.ts` — CLI entry point (chrome / generate / serve) +- `lib/browser.ts` — Browser automation (Chrome CDP + Playwright) +- `lib/tasks.ts` — Async task queue +- `dist/index.js` — Compiled Node.js bundle + +## Session + +Chrome stores login in `~/.phantom-canvas/chrome-profile/`. First time requires `--headed` to login interactively. After that, login persists and headless mode works. diff --git a/skills/phantom-canvas/references/diagram.png b/skills/phantom-canvas/references/diagram.png new file mode 100644 index 0000000..cfdad05 Binary files /dev/null and b/skills/phantom-canvas/references/diagram.png differ diff --git a/skills/phantom-canvas/references/examples.md b/skills/phantom-canvas/references/examples.md new file mode 100644 index 0000000..18bff36 --- /dev/null +++ b/skills/phantom-canvas/references/examples.md @@ -0,0 +1,307 @@ +# Phantom Canvas — Usage Examples + +## Setup + +```bash +bun install -g github:baixianger/phantom-canvas +phantom-canvas login # first time: login to Google +phantom-canvas # start server on :8420 +``` + +--- + +## 1. Text-to-Image + +Generate an image from a text prompt. Each request starts a new Gemini conversation. + +```bash +# Submit +curl -X POST localhost:8420/generate \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "Isometric pixel art knight with sword and shield, Final Fantasy Tactics style, on solid green #00FF00 chroma-key background, standing idle pose" + }' +# => {"task_id": "abc123", "status": "queued"} + +# Poll status +curl localhost:8420/task/abc123 +# => {"status": "completed", "conversation_id": "05695dfd143c4dad", "images": [...]} + +# Download image +curl localhost:8420/task/abc123/image/0 -o knight.png +``` + +--- + +## 2. Image-to-Image (Reference Upload) + +Upload a reference image so Gemini keeps the same character design. + +```bash +# Generate anchor sprite first +curl -X POST localhost:8420/generate \ + -d '{"prompt": "SE-facing isometric pixel art pirate, red bandana, blue tunic, FFT style, #00FF00 bg"}' +# Wait, download → pirate.png + +# Use it as reference for 4-direction sheet +curl -X POST localhost:8420/generate \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "Using the uploaded character, create a 2x2 sprite sheet: top-left=North (back), top-right=East (right side), bottom-left=South (front), bottom-right=SE (same as ref). Same pixel art style, same green #00FF00 background.", + "reference_images": ["/absolute/path/to/pirate.png"] + }' +``` + +> `reference_images` must be **absolute local file paths**. The browser uploads them through Gemini's file upload UI. + +--- + +## 3. Multi-Turn Conversation + +Continue in the same Gemini chat to iterate on a design. Pass `conversation_id` from a previous task. + +```bash +# Step 1: initial generation +curl -X POST localhost:8420/generate \ + -d '{"prompt": "Pixel art knight character, isometric, green background"}' +# => {"task_id": "aaa", "status": "queued"} + +# Get conversation_id from result +curl localhost:8420/task/aaa +# => {"conversation_id": "05695dfd143c4dad", "images": [...]} + +# Step 2: refine in same conversation — Gemini remembers context +curl -X POST localhost:8420/generate \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "Now make the sword larger and add a red cape", + "conversation_id": "05695dfd143c4dad" + }' + +# Step 3: generate variations +curl -X POST localhost:8420/generate \ + -d '{ + "prompt": "Show this character from 4 different angles in a 2x2 grid", + "conversation_id": "05695dfd143c4dad" + }' +``` + +> Multi-turn is useful for iterative design. Gemini keeps the visual context from previous messages. + +--- + +## 4. Video Generation + +Generate walk cycle animations. Takes 1-2 minutes. Gemini has daily video quotas. + +```bash +curl -X POST localhost:8420/generate \ + -H "Content-Type: application/json" \ + -d '{ + "prompt": "Short looping video of a pixel art knight walking in place, isometric view, Final Fantasy Tactics style", + "type": "video", + "timeout_secs": 300 + }' +``` + +With reference image: + +```bash +curl -X POST localhost:8420/generate \ + -d '{ + "prompt": "Looping walk cycle animation of this exact character", + "reference_images": ["/path/to/knight.png"], + "type": "video", + "timeout_secs": 300 + }' +``` + +--- + +## 5. Webhook Callback + +Get notified when generation completes instead of polling. + +```bash +curl -X POST localhost:8420/generate \ + -d '{ + "prompt": "pixel art mage with staff, isometric, green bg", + "callback_url": "http://localhost:3000/webhook" + }' +``` + +Your webhook receives: + +```json +{ + "task_id": "abc123", + "status": "completed", + "images": [{"index": 0, "url": "/task/abc123/image/0"}] +} +``` + +--- + +## 6. Full Pipeline — Game Asset Turnaround + +Complete workflow for generating an 8-way isometric sprite sheet: + +```bash +API=http://localhost:8420 +OUT=./sprites + +# Stage 1: Anchor sprite +TASK=$(curl -s -X POST $API/generate -d '{ + "prompt": "Single SE-facing isometric pixel art knight, dark armor, red cape, sword and shield, FFT style, solid #00FF00 green background, no shadow" +}' | jq -r .task_id) + +echo "Stage 1: $TASK" +while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done +curl -s $API/task/$TASK/image/0 -o $OUT/anchor.png +CONV=$(curl -s $API/task/$TASK | jq -r .conversation_id) +echo "Anchor saved. Conversation: $CONV" + +# Stage 2: Cardinal facings (multi-turn, Gemini remembers the knight) +TASK=$(curl -s -X POST $API/generate -d "{ + \"prompt\": \"Now create a 2x2 sprite sheet of this SAME knight from 4 angles: top-left=North (back view), top-right=East (right side), bottom-left=South (front view), bottom-right=SE (same as before). Same style, same green background.\", + \"conversation_id\": \"$CONV\" +}" | jq -r .task_id) + +echo "Stage 2: $TASK" +while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done +curl -s $API/task/$TASK/image/0 -o $OUT/cardinals.png + +# Stage 3: Diagonal facings +TASK=$(curl -s -X POST $API/generate -d "{ + \"prompt\": \"Now create 4 diagonal views in a 2x2 grid: NW (mostly back + left side), NE (mostly back + right side), SW (mostly front + left side), SE (mostly front + right side). Same character, same style.\", + \"conversation_id\": \"$CONV\" +}" | jq -r .task_id) + +echo "Stage 3: $TASK" +while [ "$(curl -s $API/task/$TASK | jq -r .status)" = "running" ]; do sleep 10; done +curl -s $API/task/$TASK/image/0 -o $OUT/diagonals.png + +# Stage 4: Assembly (local code, no API needed) +# python3 assemble.py $OUT/cardinals.png $OUT/diagonals.png $OUT/turnaround.png + +# Stage 5: Walk animation +TASK=$(curl -s -X POST $API/generate -d "{ + \"prompt\": \"Create a short looping walk cycle video of this knight, isometric SW-facing, walking in place\", + \"conversation_id\": \"$CONV\", + \"type\": \"video\", + \"timeout_secs\": 300 +}" | jq -r .task_id) + +echo "Stage 5: $TASK" +while [ "$(curl -s $API/task/$TASK | jq -r .status)" != "completed" ]; do sleep 15; done +curl -s $API/task/$TASK/image/0 -o $OUT/walk.mp4 + +echo "Done! Files in $OUT/" +ls -la $OUT/ +``` + +--- + +## 7. TypeScript/Bun Client + +```typescript +const API = "http://localhost:8420"; + +async function generate(opts: { + prompt: string; + type?: "image" | "video"; + referenceImages?: string[]; + conversationId?: string; +}) { + // Submit task + const { task_id } = await fetch(`${API}/generate`, { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + prompt: opts.prompt, + type: opts.type ?? "image", + reference_images: opts.referenceImages, + conversation_id: opts.conversationId, + }), + }).then((r) => r.json()); + + // Poll until done + while (true) { + const task = await fetch(`${API}/task/${task_id}`).then((r) => r.json()); + if (task.status === "completed") return task; + if (task.status === "failed") throw new Error(task.error); + await Bun.sleep(5000); + } +} + +// Usage +const anchor = await generate({ + prompt: "Isometric pixel art knight, FFT style, green #00FF00 bg", +}); +console.log("Anchor:", anchor.images[0].url); + +// Multi-turn: iterate on the same character +const refined = await generate({ + prompt: "Make the sword bigger and add a glowing effect", + conversationId: anchor.conversation_id, +}); + +// Save image +const img = await fetch(`${API}${refined.images[0].url}`); +await Bun.write("knight.png", img); +``` + +--- + +## 8. Python Client + +```python +import requests, time + +API = "http://localhost:8420" + +def generate(prompt, type="image", reference_images=None, conversation_id=None, timeout=180): + """Submit generation task and wait for result.""" + resp = requests.post(f"{API}/generate", json={ + "prompt": prompt, + "type": type, + "reference_images": reference_images, + "conversation_id": conversation_id, + "timeout_secs": timeout, + }) + task_id = resp.json()["task_id"] + + while True: + task = requests.get(f"{API}/task/{task_id}").json() + if task["status"] == "completed": + return task + if task["status"] == "failed": + raise RuntimeError(task["error"]) + time.sleep(5) + +def download(task, index=0, path="output.png"): + """Download generated file.""" + url = f"{API}{task['images'][index]['url']}" + with open(path, "wb") as f: + f.write(requests.get(url).content) + +# Text-to-image +result = generate("pixel art knight, isometric, green bg") +download(result, path="knight.png") + +# Multi-turn +result2 = generate( + "Now show 4 directions in a 2x2 grid", + conversation_id=result["conversation_id"] +) +download(result2, path="directions.png") + +# Video +video = generate( + "Walk cycle animation of this knight", + type="video", + conversation_id=result["conversation_id"], + timeout=300 +) +download(video, path="walk.mp4") +```