6.0 KiB
name, description, allowed-tools
| name | description | allowed-tools |
|---|---|---|
| phantom-canvas | CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more. | Bash |
Phantom Canvas
CLI + HTTP API for image and video generation through Gemini Web. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI.
Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics.
Quick Check
# Verify installation
phantom-canvas --help
If you see the logo and command list, it's installed. If not:
npm install -g phantom-canvas
First-Time Setup (Required Once)
The first time, Chrome needs to open visibly so you can log into your Google account:
# Open Chrome — a window appears. Log into Google in that window.
phantom-canvas chrome
# Or just generate with --headed:
phantom-canvas generate "test image" --headed
After logging in once, the session persists at ~/.phantom-canvas/chrome-profile/. All future runs can be headless.
When you see "Session expired", tell the user to run phantom-canvas chrome and re-login. Do NOT automate login — it requires human interaction with Google auth.
CLI Generate (for agents and scripts)
Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px).
Text-to-Image
# Simple generate
phantom-canvas generate "your prompt here" -o output.png
Returns JSON:
{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"}
Image-to-Image (Reference Upload)
Use an existing image as visual reference:
phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png
Reference path must be absolute or relative to cwd.
Multi-Turn (Iterative Design)
Continue in the same Gemini conversation to refine:
# Round 1
RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png)
CONV=$(echo $RESULT | jq -r .conversation_id)
# Round 2 — Gemini remembers the character
phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png
# Round 3
phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png
Video Generation
phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4
Video takes 1-2 minutes. Gemini has daily video quotas.
CLI Options
| Flag | Description |
|---|---|
-o, --output <file> |
Output file path |
--ref <file> |
Reference image (absolute path) |
--video |
Generate video instead of image |
--conversation <id> |
Continue previous conversation |
--timeout <secs> |
Timeout (default: 180 image, 300 video) |
--headed |
Show browser window (default: headless) |
--cdp <url> |
Chrome DevTools URL (default: http://127.0.0.1:9222) |
HTTP API (Server Mode)
For programmatic access from apps, pipelines, or webhooks:
# Start the server
phantom-canvas serve [--port 8420]
Generate an Image
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}'
Returns immediately with a task ID:
{"task_id": "abc123", "status": "queued"}
Check Task Status
curl localhost:8420/task/abc123
{
"task_id": "abc123",
"status": "completed",
"images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}],
"conversation_id": "05695dfd143c4dad",
"elapsed_secs": 45.3
}
Download Result
curl localhost:8420/task/abc123/image/0 -o result.png
With Reference Image
curl -X POST localhost:8420/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "same character from 4 angles in a 2x2 grid",
"reference_images": ["/absolute/path/to/sprite.png"]
}'
reference_imagesmust be absolute local file paths.
Check Server Health
curl localhost:8420/health
API Parameters (POST /generate)
| Field | Type | Default | Description |
|---|---|---|---|
prompt |
string | required | Generation prompt |
type |
"image" | "video" |
"image" |
Output type |
reference_images |
string[] | — | Absolute paths to reference images |
num_images |
number | 1 | How many images to generate |
timeout_secs |
number | 180 / 300 | Timeout (image / video) |
callback_url |
string | — | Webhook URL for async completion |
conversation_id |
string | — | Continue existing conversation |
Best Practices for Website Images
Use #00FF00 Green Background
Instead of asking for "transparent background", use:
"pixel art knight, isometric view, solid green #00FF00 chroma-key background"
Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing.
Iterative Workflow
For website hero images or banners, use multi-turn:
- First generate a base concept
- Use
--conversationto refine colors, layout, text - Download the final version
Prompt Tips
- Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art"
- Specify dimensions and composition when possible
- For website images, mention the context: "hero banner for a tech startup website"
Error Handling
| Error | Action |
|---|---|
| Chrome failed to start | Install Chrome or set Chrome path |
| "Session expired" | Run phantom-canvas chrome, re-login in the browser window |
| Timeout / empty images | Retry with different prompt or longer --timeout |
| Video quota exceeded | Wait until tomorrow (Gemini daily limit) |
| Port already in use | Use --port 8430 or different port |
