--- name: phantom-canvas description: CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more. allowed-tools: Bash --- # Phantom Canvas **CLI + HTTP API** for image and video generation through **Gemini Web**. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI. Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics. ![Phantom Canvas logo](logo.png) ## Quick Check ```bash # Verify installation phantom-canvas --help ``` If you see the logo and command list, it's installed. If not: ```bash npm install -g phantom-canvas ``` ## First-Time Setup (Required Once) The first time, Chrome needs to open visibly so you can log into your Google account: ```bash # Open Chrome — a window appears. Log into Google in that window. phantom-canvas chrome # Or just generate with --headed: phantom-canvas generate "test image" --headed ``` After logging in once, the session persists at `~/.phantom-canvas/chrome-profile/`. All future runs can be headless. **When you see "Session expired"**, tell the user to run `phantom-canvas chrome` and re-login. Do NOT automate login — it requires human interaction with Google auth. ## CLI Generate (for agents and scripts) Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px). ### Text-to-Image ```bash # Simple generate phantom-canvas generate "your prompt here" -o output.png ``` Returns JSON: ```json {"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"} ``` ### Image-to-Image (Reference Upload) Use an existing image as visual reference: ```bash phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png ``` Reference path must be absolute or relative to cwd. ### Multi-Turn (Iterative Design) Continue in the same Gemini conversation to refine: ```bash # Round 1 RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png) CONV=$(echo $RESULT | jq -r .conversation_id) # Round 2 — Gemini remembers the character phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png # Round 3 phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png ``` ### Video Generation ```bash phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4 ``` Video takes 1-2 minutes. Gemini has daily video quotas. ### CLI Options | Flag | Description | |---|---| | `-o, --output ` | Output file path | | `--ref ` | Reference image (absolute path) | | `--video` | Generate video instead of image | | `--conversation ` | Continue previous conversation | | `--timeout ` | Timeout (default: 180 image, 300 video) | | `--headed` | Show browser window (default: headless) | | `--cdp ` | Chrome DevTools URL (default: http://127.0.0.1:9222) | ## HTTP API (Server Mode) For programmatic access from apps, pipelines, or webhooks: ```bash # Start the server phantom-canvas serve [--port 8420] ``` ### Generate an Image ```bash curl -X POST localhost:8420/generate \ -H "Content-Type: application/json" \ -d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}' ``` Returns immediately with a task ID: ```json {"task_id": "abc123", "status": "queued"} ``` ### Check Task Status ```bash curl localhost:8420/task/abc123 ``` ```json { "task_id": "abc123", "status": "completed", "images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}], "conversation_id": "05695dfd143c4dad", "elapsed_secs": 45.3 } ``` ### Download Result ```bash curl localhost:8420/task/abc123/image/0 -o result.png ``` ### With Reference Image ```bash curl -X POST localhost:8420/generate \ -H "Content-Type: application/json" \ -d '{ "prompt": "same character from 4 angles in a 2x2 grid", "reference_images": ["/absolute/path/to/sprite.png"] }' ``` > `reference_images` must be **absolute local file paths**. ### Check Server Health ```bash curl localhost:8420/health ``` ### API Parameters (POST /generate) | Field | Type | Default | Description | |---|---|---|---| | `prompt` | string | required | Generation prompt | | `type` | `"image"` \| `"video"` | `"image"` | Output type | | `reference_images` | string[] | — | Absolute paths to reference images | | `num_images` | number | 1 | How many images to generate | | `timeout_secs` | number | 180 / 300 | Timeout (image / video) | | `callback_url` | string | — | Webhook URL for async completion | | `conversation_id` | string | — | Continue existing conversation | ## Best Practices for Website Images ### Use #00FF00 Green Background Instead of asking for "transparent background", use: ``` "pixel art knight, isometric view, solid green #00FF00 chroma-key background" ``` Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing. ### Iterative Workflow For website hero images or banners, use multi-turn: 1. First generate a base concept 2. Use `--conversation` to refine colors, layout, text 3. Download the final version ### Prompt Tips - Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art" - Specify dimensions and composition when possible - For website images, mention the context: "hero banner for a tech startup website" ## Error Handling | Error | Action | |---|---| | Chrome failed to start | Install Chrome or set Chrome path | | "Session expired" | Run `phantom-canvas chrome`, re-login in the browser window | | Timeout / empty images | Retry with different prompt or longer `--timeout` | | Video quota exceeded | Wait until tomorrow (Gemini daily limit) | | Port already in use | Use `--port 8430` or different port |