216 lines
6.0 KiB
Markdown
216 lines
6.0 KiB
Markdown
---
|
|
name: phantom-canvas
|
|
description: CLI tool and HTTP API for Gemini image/video generation via Chrome CDP. Text-to-image, img2img with reference upload, multi-turn conversation, video generation. No API keys — uses Chrome's persistent Google login. Great for generating website images, game assets, pixel art, and more.
|
|
allowed-tools: Bash
|
|
---
|
|
|
|
# Phantom Canvas
|
|
|
|
**CLI + HTTP API** for image and video generation through **Gemini Web**. No API keys, no billing — just your Google account. A persistent Chrome browser runs in the background via CDP, automating Gemini's web UI.
|
|
|
|
Great for: website images, hero banners, illustrations, pixel art, game sprites, product mockups, social media graphics.
|
|
|
|

|
|
|
|
## Quick Check
|
|
|
|
```bash
|
|
# Verify installation
|
|
phantom-canvas --help
|
|
```
|
|
|
|
If you see the logo and command list, it's installed. If not:
|
|
|
|
```bash
|
|
npm install -g phantom-canvas
|
|
```
|
|
|
|
## First-Time Setup (Required Once)
|
|
|
|
The first time, Chrome needs to open visibly so you can log into your Google account:
|
|
|
|
```bash
|
|
# Open Chrome — a window appears. Log into Google in that window.
|
|
phantom-canvas chrome
|
|
|
|
# Or just generate with --headed:
|
|
phantom-canvas generate "test image" --headed
|
|
```
|
|
|
|
After logging in once, the session persists at `~/.phantom-canvas/chrome-profile/`. All future runs can be headless.
|
|
|
|
**When you see "Session expired"**, tell the user to run `phantom-canvas chrome` and re-login. Do NOT automate login — it requires human interaction with Google auth.
|
|
|
|
## CLI Generate (for agents and scripts)
|
|
|
|
Output is JSON on stdout. Logs go to stderr. Images downloaded at full resolution (1024px).
|
|
|
|
### Text-to-Image
|
|
|
|
```bash
|
|
# Simple generate
|
|
phantom-canvas generate "your prompt here" -o output.png
|
|
```
|
|
|
|
Returns JSON:
|
|
```json
|
|
{"status":"completed","path":"output.png","type":"image","conversation_id":"abc123"}
|
|
```
|
|
|
|
### Image-to-Image (Reference Upload)
|
|
|
|
Use an existing image as visual reference:
|
|
|
|
```bash
|
|
phantom-canvas generate "same character, 4 directions in a 2x2 grid" --ref ./sprite.png -o sheet.png
|
|
```
|
|
|
|
Reference path must be absolute or relative to cwd.
|
|
|
|
### Multi-Turn (Iterative Design)
|
|
|
|
Continue in the same Gemini conversation to refine:
|
|
|
|
```bash
|
|
# Round 1
|
|
RESULT=$(phantom-canvas generate "pixel art knight, green bg" -o knight.png)
|
|
CONV=$(echo $RESULT | jq -r .conversation_id)
|
|
|
|
# Round 2 — Gemini remembers the character
|
|
phantom-canvas generate "make the sword bigger" --conversation $CONV -o v2.png
|
|
|
|
# Round 3
|
|
phantom-canvas generate "show 4 directions" --conversation $CONV -o sheet.png
|
|
```
|
|
|
|
### Video Generation
|
|
|
|
```bash
|
|
phantom-canvas generate "walk cycle animation" --video --ref knight.png -o walk.mp4
|
|
```
|
|
|
|
Video takes 1-2 minutes. Gemini has daily video quotas.
|
|
|
|
### CLI Options
|
|
|
|
| Flag | Description |
|
|
|---|---|
|
|
| `-o, --output <file>` | Output file path |
|
|
| `--ref <file>` | Reference image (absolute path) |
|
|
| `--video` | Generate video instead of image |
|
|
| `--conversation <id>` | Continue previous conversation |
|
|
| `--timeout <secs>` | Timeout (default: 180 image, 300 video) |
|
|
| `--headed` | Show browser window (default: headless) |
|
|
| `--cdp <url>` | Chrome DevTools URL (default: http://127.0.0.1:9222) |
|
|
|
|
## HTTP API (Server Mode)
|
|
|
|
For programmatic access from apps, pipelines, or webhooks:
|
|
|
|
```bash
|
|
# Start the server
|
|
phantom-canvas serve [--port 8420]
|
|
```
|
|
|
|
### Generate an Image
|
|
|
|
```bash
|
|
curl -X POST localhost:8420/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"prompt": "Isometric pixel art knight, FFT style, green #00FF00 bg"}'
|
|
```
|
|
|
|
Returns immediately with a task ID:
|
|
```json
|
|
{"task_id": "abc123", "status": "queued"}
|
|
```
|
|
|
|
### Check Task Status
|
|
|
|
```bash
|
|
curl localhost:8420/task/abc123
|
|
```
|
|
|
|
```json
|
|
{
|
|
"task_id": "abc123",
|
|
"status": "completed",
|
|
"images": [{"index": 0, "url": "/task/abc123/image/0", "type": "image"}],
|
|
"conversation_id": "05695dfd143c4dad",
|
|
"elapsed_secs": 45.3
|
|
}
|
|
```
|
|
|
|
### Download Result
|
|
|
|
```bash
|
|
curl localhost:8420/task/abc123/image/0 -o result.png
|
|
```
|
|
|
|
### With Reference Image
|
|
|
|
```bash
|
|
curl -X POST localhost:8420/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"prompt": "same character from 4 angles in a 2x2 grid",
|
|
"reference_images": ["/absolute/path/to/sprite.png"]
|
|
}'
|
|
```
|
|
|
|
> `reference_images` must be **absolute local file paths**.
|
|
|
|
### Check Server Health
|
|
|
|
```bash
|
|
curl localhost:8420/health
|
|
```
|
|
|
|
### API Parameters (POST /generate)
|
|
|
|
| Field | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `prompt` | string | required | Generation prompt |
|
|
| `type` | `"image"` \| `"video"` | `"image"` | Output type |
|
|
| `reference_images` | string[] | — | Absolute paths to reference images |
|
|
| `num_images` | number | 1 | How many images to generate |
|
|
| `timeout_secs` | number | 180 / 300 | Timeout (image / video) |
|
|
| `callback_url` | string | — | Webhook URL for async completion |
|
|
| `conversation_id` | string | — | Continue existing conversation |
|
|
|
|
## Best Practices for Website Images
|
|
|
|
### Use #00FF00 Green Background
|
|
|
|
Instead of asking for "transparent background", use:
|
|
|
|
```
|
|
"pixel art knight, isometric view, solid green #00FF00 chroma-key background"
|
|
```
|
|
|
|
Gemini interprets "transparent" as a checkerboard pattern. Green screen is easy to chroma-key in post-processing.
|
|
|
|
### Iterative Workflow
|
|
|
|
For website hero images or banners, use multi-turn:
|
|
|
|
1. First generate a base concept
|
|
2. Use `--conversation` to refine colors, layout, text
|
|
3. Download the final version
|
|
|
|
### Prompt Tips
|
|
|
|
- Be specific about art style: "minimalist", "flat design", "3D render", "watercolor", "pixel art"
|
|
- Specify dimensions and composition when possible
|
|
- For website images, mention the context: "hero banner for a tech startup website"
|
|
|
|
## Error Handling
|
|
|
|
| Error | Action |
|
|
|---|---|
|
|
| Chrome failed to start | Install Chrome or set Chrome path |
|
|
| "Session expired" | Run `phantom-canvas chrome`, re-login in the browser window |
|
|
| Timeout / empty images | Retry with different prompt or longer `--timeout` |
|
|
| Video quota exceeded | Wait until tomorrow (Gemini daily limit) |
|
|
| Port already in use | Use `--port 8430` or different port |
|