Files

Kunthawat Greethong a7477db220 feat: add phantom-canvas skill for Gemini image generation

2026-05-26 12:43:48 +07:00

6.0 KiB

Raw Blame History

name, description, allowed-tools

name	description	allowed-tools
phantom-canvas	CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more.	Bash

Phantom Canvas

CLI + HTTP API for image and video generation through Gemini Web. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI.

Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics.

Quick Check

# Verify installation
phantom-canvas --help

If you see the logo and command list, it's installed. If not:

npm install -g phantom-canvas

First-Time Setup (Required Once)

The first time, Chrome needs to open visibly so you can log into your Google account:

# Open Chrome — a window appears. Log into Google in that window.
phantom-canvas chrome

# Or just generate with --headed:
phantom-canvas generate "test image" --headed

After logging in once, the session persists at ~/.phantom-canvas/chrome-profile/. All future runs can be headless.

When you see "Session expired", tell the user to run phantom-canvas chrome and re-login. Do NOT automate login — it requires human interaction with Google auth.

CLI Generate (for agents and scripts)

Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px).

Text-to-Image

# Simple generate
phantom-canvas generate "your prompt here" -o output.png

Returns JSON:

{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"}

Image-to-Image (Reference Upload)

Use an existing image as visual reference:

phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png

Reference path must be absolute or relative to cwd.

Multi-Turn (Iterative Design)

Continue in the same Gemini conversation to refine:

# Round 1
RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png)
CONV=$(echo $RESULT | jq -r .conversation_id)

# Round 2 — Gemini remembers the character
phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png

# Round 3
phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png

Video Generation

phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4

Video takes 1-2 minutes. Gemini has daily video quotas.

CLI Options

Flag	Description
`-o, --output <file>`	Output file path
`--ref <file>`	Reference image (absolute path)
`--video`	Generate video instead of image
`--conversation <id>`	Continue previous conversation
`--timeout <secs>`	Timeout (default: 180 image, 300 video)
`--headed`	Show browser window (default: headless)
`--cdp <url>`	Chrome DevTools URL (default: http://127.0.0.1:9222)

HTTP API (Server Mode)

For programmatic access from apps, pipelines, or webhooks:

# Start the server
phantom-canvas serve [--port 8420]

Generate an Image

curl -X POST localhost:8420/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}'

Returns immediately with a task ID:

{"task_id": "abc123", "status": "queued"}

Check Task Status

curl localhost:8420/task/abc123

{
  "task_id": "abc123",
  "status": "completed",
  "images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}],
  "conversation_id": "05695dfd143c4dad",
  "elapsed_secs": 45.3
}

Download Result

curl localhost:8420/task/abc123/image/0 -o result.png

With Reference Image

curl -X POST localhost:8420/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "same character from 4 angles in a 2x2 grid",
    "reference_images": ["/absolute/path/to/sprite.png"]
  }'

reference_images must be absolute local file paths.

Check Server Health

curl localhost:8420/health

API Parameters (POST /generate)

Field	Type	Default	Description
`prompt`	string	required	Generation prompt
`type`	`"image"` \| `"video"`	`"image"`	Output type
`reference_images`	string[]	—	Absolute paths to reference images
`num_images`	number	1	How many images to generate
`timeout_secs`	number	180 / 300	Timeout (image / video)
`callback_url`	string	—	Webhook URL for async completion
`conversation_id`	string	—	Continue existing conversation

Best Practices for Website Images

Use #00FF00 Green Background

Instead of asking for "transparent background", use:

"pixel art knight, isometric view, solid green #00FF00 chroma-key background"

Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing.

Iterative Workflow

For website hero images or banners, use multi-turn:

First generate a base concept
Use --conversation to refine colors, layout, text
Download the final version

Prompt Tips

Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art"
Specify dimensions and composition when possible
For website images, mention the context: "hero banner for a tech startup website"

Error Handling

Error	Action
Chrome failed to start	Install Chrome or set Chrome path
"Session expired"	Run `phantom-canvas chrome`, re-login in the browser window
Timeout / empty images	Retry with different prompt or longer `--timeout`
Video quota exceeded	Wait until tomorrow (Gemini daily limit)
Port already in use	Use `--port 8430` or different port

6.0 KiB Raw Blame History