feat: Import 35+ skills, merge duplicates, add openclaw installer
Major updates: - Added 35+ new skills from awesome-opencode-skills and antigravity repos - Merged SEO skills into seo-master - Merged architecture skills into architecture - Merged security skills into security-auditor and security-coder - Merged testing skills into testing-master and testing-patterns - Merged pentesting skills into pentesting - Renamed website-creator to thai-frontend-dev - Replaced skill-creator with github version - Removed Chutes references (use MiniMax API instead) - Added install-openclaw-skills.sh for cross-platform installation - Updated .env.example with MiniMax API credentials
This commit is contained in:
649
skills/minimax-multimodal-toolkit/SKILL.md
Normal file
649
skills/minimax-multimodal-toolkit/SKILL.md
Normal file
@@ -0,0 +1,649 @@
|
||||
---
|
||||
name: minimax-multimodal-toolkit
|
||||
description: MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
|
||||
---
|
||||
|
||||
# MiniMax Multi-Modal Toolkit
|
||||
|
||||
Generate voice, music, video, and image content via MiniMax APIs — the unified entry for **MiniMax multimodal** use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/video format conversion, concatenation, trimming, and extraction.
|
||||
|
||||
## Output Directory
|
||||
|
||||
**All generated files MUST be saved to `minimax-output/` under the AGENT'S current working directory (NOT the skill directory).** Every script call MUST include an explicit `--output` / `-o` argument pointing to this location. Never omit the output argument or rely on script defaults.
|
||||
|
||||
**Rules:**
|
||||
1. Before running any script, ensure `minimax-output/` exists in the agent's working directory (create if needed: `mkdir -p minimax-output`)
|
||||
2. Always use absolute or relative paths from the agent's working directory: `--output minimax-output/video.mp4`
|
||||
3. **Never** `cd` into the skill directory to run scripts — run from the agent's working directory using the full script path
|
||||
4. Intermediate/temp files (segment audio, video segments, extracted frames) are automatically placed in `minimax-output/tmp/`. They can be cleaned up when no longer needed: `rm -rf minimax-output/tmp`
|
||||
|
||||
## Prerequisites
|
||||
|
||||
```bash
|
||||
brew install ffmpeg jq # macOS (or apt install ffmpeg jq on Linux)
|
||||
bash scripts/check_environment.sh
|
||||
```
|
||||
|
||||
No Python or pip required — all scripts are pure bash using `curl`, `ffmpeg`, `jq`, and `xxd`.
|
||||
|
||||
### API Host Configuration
|
||||
|
||||
MiniMax provides two service endpoints for different regions. Set `MINIMAX_API_HOST` before running any script:
|
||||
|
||||
| Region | Platform URL | API Host Value |
|
||||
|--------|-------------|----------------|
|
||||
| China Mainland(中国大陆) | https://platform.minimaxi.com | `https://api.minimaxi.com` |
|
||||
| Global(全球) | https://platform.minimax.io | `https://api.minimax.io` |
|
||||
|
||||
```bash
|
||||
# China Mainland
|
||||
export MINIMAX_API_HOST="https://api.minimaxi.com"
|
||||
|
||||
# or Global
|
||||
export MINIMAX_API_HOST="https://api.minimax.io"
|
||||
```
|
||||
|
||||
**IMPORTANT — When API Host is missing:**
|
||||
Before running any script, check if `MINIMAX_API_HOST` is set in the environment. If it is NOT configured:
|
||||
1. Ask the user which service endpoint their MiniMax account uses:
|
||||
- **China Mainland** → `https://api.minimaxi.com`
|
||||
- **Global** → `https://api.minimax.io`
|
||||
2. Instruct and help user to set it via `export MINIMAX_API_HOST="https://api.minimaxi.com"` (or the global variant) in their terminal or add it to their shell profile (`~/.zshrc` / `~/.bashrc`) for persistence
|
||||
|
||||
### API Key Configuration
|
||||
|
||||
Set the `MINIMAX_API_KEY` environment variable before running any script:
|
||||
|
||||
```bash
|
||||
export MINIMAX_API_KEY="your-api-key-here"
|
||||
```
|
||||
|
||||
The key starts with `sk-api-` or `sk-cp-`, obtainable from https://platform.minimaxi.com (China) or https://platform.minimax.io (Global)
|
||||
|
||||
**IMPORTANT — When API Key is missing:**
|
||||
Before running any script, check if `MINIMAX_API_KEY` is set in the environment. If it is NOT configured:
|
||||
1. Ask the user to provide their MiniMax API key
|
||||
2. Instruct and help user to set it via `export MINIMAX_API_KEY="sk-..."` in their terminal or add it to their shell profile (`~/.zshrc` / `~/.bashrc`) for persistence
|
||||
|
||||
## Key Capabilities
|
||||
|
||||
| Capability | Description | Entry point |
|
||||
|------------|-------------|-------------|
|
||||
| TTS | Text-to-speech synthesis with multiple voices and emotions | `scripts/tts/generate_voice.sh` |
|
||||
| Voice Cloning | Clone a voice from an audio sample (10s–5min) | `scripts/tts/generate_voice.sh clone` |
|
||||
| Voice Design | Create a custom voice from a text description | `scripts/tts/generate_voice.sh design` |
|
||||
| Music Generation | Generate songs with lyrics or instrumental tracks | `scripts/music/generate_music.sh` |
|
||||
| Image Generation | Text-to-image, image-to-image with character reference | `scripts/image/generate_image.sh` |
|
||||
| Video Generation | Text-to-video, image-to-video, subject reference, templates | `scripts/video/generate_video.sh` |
|
||||
| Long Video | Multi-scene chained video with crossfade transitions | `scripts/video/generate_long_video.sh` |
|
||||
| Media Tools | Audio/video format conversion, concatenation, trimming, extraction | `scripts/media_tools.sh` |
|
||||
|
||||
## TTS (Text-to-Speech)
|
||||
|
||||
Entry point: `scripts/tts/generate_voice.sh`
|
||||
|
||||
### IMPORTANT: Single voice vs Multi-segment — Choose the right approach
|
||||
|
||||
| User intent | Approach |
|
||||
|-------------|----------|
|
||||
| Single voice / no multi-character need | `tts` command — generate the entire text in one call |
|
||||
| Multiple characters / narrator + dialogue | `generate` command with segments.json |
|
||||
|
||||
**Default behavior:** When the user simply asks to generate speech/voice and does NOT mention multiple voices or characters, use the `tts` command directly with a single appropriate voice. Do NOT split into segments or use the multi-segment pipeline — just pass the full text to `tts` in one call.
|
||||
|
||||
Only use multi-segment `generate` when:
|
||||
- The user explicitly needs multiple voices/characters
|
||||
- The text requires narrator + character dialogue separation
|
||||
- The text exceeds **10,000 characters** (API limit per request) — in this case, split into segments with the same voice
|
||||
|
||||
### Single-voice generation (DEFAULT)
|
||||
|
||||
```bash
|
||||
bash scripts/tts/generate_voice.sh tts "Hello world" -o minimax-output/hello.mp3
|
||||
bash scripts/tts/generate_voice.sh tts "你好世界" -v female-shaonv -o minimax-output/hello_cn.mp3
|
||||
```
|
||||
|
||||
### Multi-segment generation (multi-voice / audiobook / podcast)
|
||||
|
||||
**Complete workflow — follow ALL steps in order:**
|
||||
|
||||
1. **Write segments.json** — split text into segments with voice assignments (see format and rules below)
|
||||
2. **Run `generate` command** — this reads segments.json, generates audio for EACH segment via TTS API, then merges them into a single output file with crossfade
|
||||
|
||||
```bash
|
||||
# Step 1: Write segments.json to minimax-output/
|
||||
# (use the Write tool to create minimax-output/segments.json)
|
||||
|
||||
# Step 2: Generate audio from segments.json — this is the CRITICAL step
|
||||
# It generates each segment individually and merges them into one file
|
||||
bash scripts/tts/generate_voice.sh generate minimax-output/segments.json \
|
||||
-o minimax-output/output.mp3 --crossfade 200
|
||||
```
|
||||
|
||||
**Do NOT skip Step 2.** Writing segments.json alone does nothing — you MUST run the `generate` command to actually produce audio.
|
||||
|
||||
### Voice management
|
||||
|
||||
```bash
|
||||
# List all available voices
|
||||
bash scripts/tts/generate_voice.sh list-voices
|
||||
|
||||
# Voice cloning (from audio sample, 10s–5min)
|
||||
bash scripts/tts/generate_voice.sh clone sample.mp3 --voice-id my-voice
|
||||
|
||||
# Voice design (from text description)
|
||||
bash scripts/tts/generate_voice.sh design "A warm female narrator voice" --voice-id narrator
|
||||
```
|
||||
|
||||
### Audio processing
|
||||
|
||||
```bash
|
||||
bash scripts/tts/generate_voice.sh merge part1.mp3 part2.mp3 -o minimax-output/combined.mp3
|
||||
bash scripts/tts/generate_voice.sh convert input.wav -o minimax-output/output.mp3
|
||||
```
|
||||
|
||||
### TTS Models
|
||||
|
||||
| Model | Notes |
|
||||
|-------|-------|
|
||||
| speech-2.8-hd | Recommended, auto emotion matching |
|
||||
| speech-2.8-turbo | Faster variant |
|
||||
| speech-2.6-hd | Previous gen, manual emotion |
|
||||
| speech-2.6-turbo | Previous gen, faster |
|
||||
|
||||
### segments.json Format
|
||||
|
||||
Default crossfade between segments: **200ms** (`--crossfade 200`).
|
||||
|
||||
```json
|
||||
[
|
||||
{ "text": "Hello!", "voice_id": "female-shaonv", "emotion": "" },
|
||||
{ "text": "Welcome.", "voice_id": "male-qn-qingse", "emotion": "happy" }
|
||||
]
|
||||
```
|
||||
|
||||
Leave `emotion` empty for speech-2.8 models (auto-matched from text).
|
||||
|
||||
### IMPORTANT: Multi-Segment Script Generation Rules (Audiobooks, Podcasts, etc.)
|
||||
|
||||
When generating segments.json for audiobooks, podcasts, or any multi-character narration, you MUST split narration text from character dialogue into separate segments with distinct voices.
|
||||
|
||||
**Rule: Narration and dialogue are ALWAYS separate segments.**
|
||||
|
||||
A sentence like `"Tom said: The weather is great today!"` must be split into two segments:
|
||||
- Segment 1 (narrator voice): `"Tom said:"`
|
||||
- Segment 2 (character voice): `"The weather is great today!"`
|
||||
|
||||
**Example — Audiobook with narrator + 2 characters:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "text": "Morning sunlight streamed into the classroom as students filed in one by one.", "voice_id": "narrator-voice", "emotion": "" },
|
||||
{ "text": "Tom smiled and turned to Lisa:", "voice_id": "narrator-voice", "emotion": "" },
|
||||
{ "text": "The weather is amazing today! Let's go to the park after school!", "voice_id": "tom-voice", "emotion": "happy" },
|
||||
{ "text": "Lisa thought for a moment, then replied:", "voice_id": "narrator-voice", "emotion": "" },
|
||||
{ "text": "Sure, but I need to drop off my backpack at home first.", "voice_id": "lisa-voice", "emotion": "" },
|
||||
{ "text": "They exchanged a smile and went back to listening to the lecture.", "voice_id": "narrator-voice", "emotion": "" }
|
||||
]
|
||||
```
|
||||
|
||||
**Key principles:**
|
||||
1. **Narrator** uses a consistent neutral narrator voice throughout
|
||||
2. **Each character** has a dedicated voice_id, maintained consistently across all their dialogue
|
||||
3. **Split at dialogue boundaries** — `"He said:"` is narrator, the quoted content is the character
|
||||
4. **Do NOT merge** narrator text and character speech into a single segment
|
||||
5. For characters without pre-existing voice_ids, use voice cloning or voice design to create them first, then reference the created voice_id in segments
|
||||
|
||||
## Music Generation
|
||||
|
||||
Entry point: `scripts/music/generate_music.sh`
|
||||
|
||||
### IMPORTANT: Instrumental vs Lyrics — When to use which
|
||||
|
||||
| Scenario | Mode | Action |
|
||||
|----------|------|--------|
|
||||
| BGM for video / voice / podcast | Instrumental (default) | Use `--instrumental` directly, do NOT ask user |
|
||||
| User explicitly asks to "create music" / "make a song" | Ask user first | Ask whether they want instrumental or with lyrics |
|
||||
|
||||
**When adding background music to video or voice content**, always default to instrumental mode (`--instrumental`). Do not ask the user — BGM should never have vocals competing with the main content.
|
||||
|
||||
**When the user explicitly asks to create/generate music as the primary task**, ask them whether they want:
|
||||
- Instrumental (pure music, no vocals)
|
||||
- With lyrics (song with vocals — user provides or you help write lyrics)
|
||||
|
||||
```bash
|
||||
# Instrumental (for BGM or when user chooses instrumental)
|
||||
bash scripts/music/generate_music.sh \
|
||||
--instrumental \
|
||||
--prompt "ambient electronic, atmospheric" \
|
||||
--output minimax-output/ambient.mp3 --download
|
||||
|
||||
# Song with lyrics (when user chooses vocal music)
|
||||
bash scripts/music/generate_music.sh \
|
||||
--lyrics "[verse]\nHello world\n[chorus]\nLa la la" \
|
||||
--prompt "indie folk, melancholic" \
|
||||
--output minimax-output/song.mp3 --download
|
||||
|
||||
# With style fields
|
||||
bash scripts/music/generate_music.sh \
|
||||
--lyrics "[verse]\nLyrics here" \
|
||||
--genre "pop" --mood "upbeat" --tempo "fast" \
|
||||
--output minimax-output/pop_track.mp3 --download
|
||||
```
|
||||
|
||||
### Music Model
|
||||
|
||||
Default model: `music-2.5`
|
||||
|
||||
`music-2.5` does **not** support `--instrumental` directly. When instrumental music is needed, the script automatically applies a workaround:
|
||||
- Sets lyrics to `[intro] [outro]` (empty structural tags, no actual vocals), appends `pure music, no lyrics` to the prompt
|
||||
|
||||
This produces instrumental-style output without requiring manual intervention. You can always use `--instrumental` and the script handles the rest.
|
||||
|
||||
## Image Generation
|
||||
|
||||
Entry point: `scripts/image/generate_image.sh`
|
||||
|
||||
Model: `image-01` — photorealistic image generation from text prompts, with optional character reference for image-to-image.
|
||||
|
||||
### IMPORTANT: Mode Selection — t2i vs i2i
|
||||
|
||||
| User intent | Mode |
|
||||
|-------------|------|
|
||||
| Generate image from text description (default) | `t2i` — text-to-image |
|
||||
| Generate image with a character reference photo (keep same person) | `i2i` — image-to-image |
|
||||
|
||||
**Default behavior:** When the user asks to generate/create an image without mentioning a reference photo, use `t2i` mode (default). Only use `i2i` mode when the user provides a character reference image or explicitly asks to base the image on an existing person's appearance.
|
||||
|
||||
### IMPORTANT: Aspect Ratio — Infer from user context
|
||||
|
||||
Do NOT always default to `1:1`. Analyze the user's request and choose the most appropriate aspect ratio:
|
||||
|
||||
| User intent / context | Recommended ratio | Resolution |
|
||||
|-----------------------|-------------------|------------|
|
||||
| 头像、图标、社交媒体头像、avatar、icon、profile pic | `1:1` | 1024×1024 |
|
||||
| 风景、横幅、桌面壁纸、landscape、banner、desktop wallpaper | `16:9` | 1280×720 |
|
||||
| 传统照片、经典比例、classic photo | `4:3` | 1152×864 |
|
||||
| 摄影作品、杂志封面、photography、magazine | `3:2` | 1248×832 |
|
||||
| 人像竖图、海报、portrait photo、poster | `2:3` | 832×1248 |
|
||||
| 竖版海报、书籍封面、tall poster、book cover | `3:4` | 864×1152 |
|
||||
| 手机壁纸、社交媒体故事、phone wallpaper、story、reel | `9:16` | 720×1280 |
|
||||
| 超宽全景、电影画幅、panoramic、cinematic ultrawide | `21:9` | 1344×576 |
|
||||
| 未指定特定需求 / ambiguous | `1:1` | 1024×1024 |
|
||||
|
||||
### IMPORTANT: Image Count — When to generate multiple images
|
||||
|
||||
| User intent | Count (`-n`) |
|
||||
|-------------|--------------|
|
||||
| Default / single image request | `1` (default) |
|
||||
| 用户说"几张"、"多张"、"一些" / "a few", "several" | `3` |
|
||||
| 用户说"多种方案"、"备选" / "variations", "options" | `3`–`4` |
|
||||
| 用户明确指定数量 | Use the specified number (1–9) |
|
||||
|
||||
### Text-to-Image Examples
|
||||
|
||||
```bash
|
||||
# Basic text-to-image
|
||||
bash scripts/image/generate_image.sh \
|
||||
--prompt "A cat sitting on a rooftop at sunset, cinematic lighting, warm tones, photorealistic" \
|
||||
-o minimax-output/cat.png
|
||||
|
||||
# Landscape with inferred aspect ratio
|
||||
bash scripts/image/generate_image.sh \
|
||||
--prompt "Mountain landscape with misty valleys, photorealistic, golden hour" \
|
||||
--aspect-ratio 16:9 \
|
||||
-o minimax-output/landscape.png
|
||||
|
||||
# Phone wallpaper (portrait 9:16)
|
||||
bash scripts/image/generate_image.sh \
|
||||
--prompt "Aurora borealis over a snowy forest, vivid colors, magical atmosphere" \
|
||||
--aspect-ratio 9:16 \
|
||||
-o minimax-output/wallpaper.png
|
||||
|
||||
# Multiple variations
|
||||
bash scripts/image/generate_image.sh \
|
||||
--prompt "Abstract geometric art, vibrant colors" \
|
||||
-n 3 \
|
||||
-o minimax-output/art.png
|
||||
|
||||
# With prompt optimizer
|
||||
bash scripts/image/generate_image.sh \
|
||||
--prompt "A man standing on Venice Beach, 90s documentary style" \
|
||||
--aspect-ratio 16:9 --prompt-optimizer \
|
||||
-o minimax-output/beach.png
|
||||
|
||||
# Custom dimensions (must be multiple of 8)
|
||||
bash scripts/image/generate_image.sh \
|
||||
--prompt "Product photo of a luxury watch on marble surface" \
|
||||
--width 1024 --height 768 \
|
||||
-o minimax-output/watch.png
|
||||
```
|
||||
|
||||
### Image-to-Image (Character Reference)
|
||||
|
||||
Use a reference photo to generate images with the same character in new scenes. Best results with a single front-facing portrait. Supported formats: JPG, JPEG, PNG (max 10MB).
|
||||
|
||||
```bash
|
||||
# Character reference — place same person in a new scene
|
||||
bash scripts/image/generate_image.sh \
|
||||
--mode i2i \
|
||||
--prompt "A girl looking into the distance from a library window, warm afternoon light" \
|
||||
--ref-image face.jpg \
|
||||
--aspect-ratio 16:9 \
|
||||
-o minimax-output/girl_library.png
|
||||
|
||||
# Multiple character variations
|
||||
bash scripts/image/generate_image.sh \
|
||||
--mode i2i \
|
||||
--prompt "A woman in a red dress at a gala event, elegant, cinematic" \
|
||||
--ref-image face.jpg -n 3 \
|
||||
-o minimax-output/gala.png
|
||||
```
|
||||
|
||||
### Aspect Ratio Reference
|
||||
|
||||
| Ratio | Resolution | Best for |
|
||||
|-------|------------|----------|
|
||||
| `1:1` | 1024×1024 | Default, avatars, icons, social media |
|
||||
| `16:9` | 1280×720 | Landscape, banner, desktop wallpaper |
|
||||
| `4:3` | 1152×864 | Classic photo, presentations |
|
||||
| `3:2` | 1248×832 | Photography, magazine layout |
|
||||
| `2:3` | 832×1248 | Portrait photo, poster |
|
||||
| `3:4` | 864×1152 | Book cover, tall poster |
|
||||
| `9:16` | 720×1280 | Phone wallpaper, social story/reel |
|
||||
| `21:9` | 1344×576 | Ultra-wide panoramic, cinematic |
|
||||
|
||||
### Key Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--prompt TEXT` | Image description, max 1500 chars (required) |
|
||||
| `--aspect-ratio RATIO` | Aspect ratio (see table above). Infer from user context |
|
||||
| `--width PX` / `--height PX` | Custom size, 512–2048, must be multiple of 8, both required together. Overridden by `--aspect-ratio` if both set |
|
||||
| `-n N` | Number of images to generate, 1–9 (default 1) |
|
||||
| `--seed N` | Random seed for reproducibility. Same seed + same params → similar results |
|
||||
| `--prompt-optimizer` | Enable automatic prompt optimization by the API |
|
||||
| `--ref-image FILE` | Character reference image for i2i mode (local file or URL, JPG/JPEG/PNG, max 10MB) |
|
||||
| `--no-download` | Print image URLs instead of downloading files |
|
||||
| `--aigc-watermark` | Add AIGC watermark to generated images |
|
||||
|
||||
## Video Generation
|
||||
|
||||
### IMPORTANT: Single vs Multi-Segment — Choose the right script
|
||||
|
||||
| User intent | Script to use |
|
||||
|-------------|---------------|
|
||||
| Default / no special request | `scripts/video/generate_video.sh` (single segment, **10s, 768P**) |
|
||||
| User explicitly asks for "long video", "multi-scene", "story", or duration > 10s | `scripts/video/generate_long_video.sh` (multi-segment) |
|
||||
|
||||
**Default behavior:** Always use single-segment `generate_video.sh` with **duration 10s and resolution 768P** unless the user explicitly asks for a long video, multi-scene video, or specifies a total duration exceeding 10 seconds. Do NOT automatically split into multiple segments — a single 10s video is the standard output. Only use `generate_long_video.sh` when the user clearly needs multi-scene or longer content.
|
||||
|
||||
Entry point (single video): `scripts/video/generate_video.sh`
|
||||
Entry point (long/multi-scene): `scripts/video/generate_long_video.sh`
|
||||
|
||||
### Video Model Constraints (MUST follow)
|
||||
|
||||
**Duration limits by model and resolution:**
|
||||
|
||||
| Model | 720P | 768P | 1080P |
|
||||
|-------|------|------|-------|
|
||||
| MiniMax-Hailuo-2.3 | - | 6s or **10s** | 6s only |
|
||||
| MiniMax-Hailuo-2.3-Fast | - | 6s or **10s** | 6s only |
|
||||
| MiniMax-Hailuo-02 | - | 6s or **10s** | 6s only |
|
||||
| T2V-01 / T2V-01-Director | 6s only | - | - |
|
||||
| I2V-01 / I2V-01-Director / I2V-01-live | 6s only | - | - |
|
||||
| S2V-01 (ref) | 6s only | - | - |
|
||||
|
||||
**Resolution options by model and duration:**
|
||||
|
||||
| Model | 6s | 10s |
|
||||
|-------|-----|-----|
|
||||
| MiniMax-Hailuo-2.3 | 768P (default), 1080P | 768P only |
|
||||
| MiniMax-Hailuo-2.3-Fast | 768P (default), 1080P | 768P only |
|
||||
| MiniMax-Hailuo-02 | 512P, 768P (default), 1080P | 512P, 768P (default) |
|
||||
| Other models | 720P (default) | Not supported |
|
||||
|
||||
**Key rules:**
|
||||
- **Default: 10s + 768P** (best balance of length and quality for MiniMax-Hailuo-2.3)
|
||||
- 1080P only supports 6s duration — if user requests 1080P, set `--duration 6`
|
||||
- 10s duration only works with 768P (or 512P on Hailuo-02) — never combine 10s + 1080P
|
||||
- Older models (T2V-01, I2V-01, S2V-01) only support 6s at 720P
|
||||
|
||||
### IMPORTANT: Prompt Optimization (MUST follow before generating any video)
|
||||
|
||||
Before calling any video generation script, you MUST optimize the user's prompt by reading and applying `references/video-prompt-guide.md`. Never pass the user's raw description directly as `--prompt`.
|
||||
|
||||
**Optimization steps:**
|
||||
|
||||
1. **Apply the Professional Formula**: `Main subject + Scene + Movement + Camera motion + Aesthetic atmosphere`
|
||||
- BAD: `"A puppy in a park"`
|
||||
- GOOD: `"A golden retriever puppy runs toward the camera on a sun-dappled grass path in a park, [跟随] smooth tracking shot, warm golden hour lighting, shallow depth of field, joyful atmosphere"`
|
||||
|
||||
2. **Add camera instructions** using `[指令]` syntax: `[推进]`, `[拉远]`, `[跟随]`, `[固定]`, `[左摇]`, etc.
|
||||
|
||||
3. **Include aesthetic details**: lighting (golden hour, dramatic side lighting), color grading (warm tones, cinematic), texture (dust particles, rain droplets), atmosphere (intimate, epic, peaceful)
|
||||
|
||||
4. **Keep to 1-2 key actions** for 6-10 second videos — do not overcrowd with events
|
||||
|
||||
5. **For i2v mode** (image-to-video): Focus prompt on **movement and change only**, since the image already establishes the visual. Do NOT re-describe what's in the image.
|
||||
- BAD: `"A lake with mountains"` (just repeating the image)
|
||||
- GOOD: `"Gentle ripples spread across the water surface, a breeze rustles the distant trees, [固定] fixed camera, soft morning light, peaceful and serene"`
|
||||
|
||||
6. **For multi-segment long videos**: Each segment's prompt must be self-contained and optimized individually. The i2v segments (segment 2+) should describe motion/change relative to the previous segment's ending frame.
|
||||
|
||||
```bash
|
||||
# Text-to-video (default: 10s, 768P)
|
||||
bash scripts/video/generate_video.sh \
|
||||
--mode t2v \
|
||||
--prompt "A golden retriever puppy bounds toward the camera on a sunlit grass path, [跟随] tracking shot, warm golden hour, shallow depth of field, joyful" \
|
||||
--output minimax-output/puppy.mp4
|
||||
|
||||
# Text-to-video with 1080P (must use --duration 6)
|
||||
bash scripts/video/generate_video.sh \
|
||||
--mode t2v \
|
||||
--prompt "A golden retriever puppy bounds toward the camera" \
|
||||
--duration 6 --resolution 1080P \
|
||||
--output minimax-output/puppy_hd.mp4
|
||||
|
||||
# Image-to-video (prompt focuses on MOTION, not image content)
|
||||
bash scripts/video/generate_video.sh \
|
||||
--mode i2v \
|
||||
--prompt "The petals begin to sway gently in the breeze, soft light shifts across the surface, [固定] fixed framing, dreamy pastel tones" \
|
||||
--first-frame photo.jpg \
|
||||
--output minimax-output/animated.mp4
|
||||
|
||||
# Start-end frame interpolation (sef mode uses MiniMax-Hailuo-02)
|
||||
bash scripts/video/generate_video.sh \
|
||||
--mode sef \
|
||||
--first-frame start.jpg --last-frame end.jpg \
|
||||
--output minimax-output/transition.mp4
|
||||
|
||||
# Subject reference (face consistency, ref mode uses S2V-01, 6s only)
|
||||
bash scripts/video/generate_video.sh \
|
||||
--mode ref \
|
||||
--prompt "A young woman in a white dress walks slowly through a sunlit garden, [跟随] smooth tracking, warm natural lighting, cinematic depth of field" \
|
||||
--subject-image face.jpg \
|
||||
--duration 6 \
|
||||
--output minimax-output/person.mp4
|
||||
```
|
||||
|
||||
### Long-form Video (Multi-scene)
|
||||
|
||||
Multi-scene long videos chain segments together: the first segment generates via text-to-video (t2v), then each subsequent segment uses the last frame of the previous segment as its first frame (i2v). Segments are joined with crossfade transitions for smooth continuity. Default is 10 seconds per segment.
|
||||
|
||||
**Workflow:**
|
||||
1. Segment 1: t2v — generated purely from the optimized text prompt
|
||||
2. Segment 2+: i2v — the previous segment's last frame becomes `first_frame_image`, prompt describes **motion and change from that ending state**
|
||||
3. All segments are concatenated with 0.5s crossfade transitions to eliminate jump cuts
|
||||
4. Optional: AI-generated background music is overlaid
|
||||
|
||||
**Prompt rules for each segment:**
|
||||
- Each segment prompt MUST be independently optimized using the Professional Formula
|
||||
- Segment 1 (t2v): Full scene description with subject, scene, camera, atmosphere
|
||||
- Segment 2+ (i2v): Focus on **what changes and moves** from the previous ending frame. Do NOT repeat the visual description — the first frame already provides it
|
||||
- Maintain visual consistency: keep lighting, color grading, and style keywords consistent across segments
|
||||
- Each segment covers only 10 seconds of action — keep it focused
|
||||
|
||||
```bash
|
||||
# Example: 3-segment story with optimized per-segment prompts (default: 10s/segment, 768P)
|
||||
bash scripts/video/generate_long_video.sh \
|
||||
--scenes \
|
||||
"A lone astronaut stands on a red desert planet surface, wind blowing dust particles, [推进] slow push in toward the visor, dramatic rim lighting, cinematic sci-fi atmosphere" \
|
||||
"The astronaut turns and begins walking toward a distant glowing structure on the horizon, dust swirling around boots, [跟随] tracking from behind, vast desolate landscape, golden light from the structure" \
|
||||
"The astronaut reaches the structure entrance, a massive doorway pulses with blue energy, [推进] slow push in toward the doorway, light reflects off the visor, awe-inspiring epic scale" \
|
||||
--music-prompt "cinematic orchestral ambient, slow build, sci-fi atmosphere" \
|
||||
--output minimax-output/long_video.mp4
|
||||
|
||||
# With custom settings
|
||||
bash scripts/video/generate_long_video.sh \
|
||||
--scenes "Scene 1 prompt" "Scene 2 prompt" \
|
||||
--segment-duration 10 \
|
||||
--resolution 768P \
|
||||
--crossfade 0.5 \
|
||||
--music-prompt "calm ambient background music" \
|
||||
--output minimax-output/long_video.mp4
|
||||
```
|
||||
|
||||
### Add Background Music
|
||||
|
||||
```bash
|
||||
bash scripts/video/add_bgm.sh \
|
||||
--video input.mp4 \
|
||||
--generate-bgm --instrumental \
|
||||
--music-prompt "soft piano background" \
|
||||
--bgm-volume 0.3 \
|
||||
--output minimax-output/output_with_bgm.mp4
|
||||
```
|
||||
|
||||
### Template Video
|
||||
|
||||
```bash
|
||||
bash scripts/video/generate_template_video.sh \
|
||||
--template-id 392753057216684038 \
|
||||
--media photo.jpg \
|
||||
--output minimax-output/template_output.mp4
|
||||
```
|
||||
|
||||
### Video Models
|
||||
|
||||
| Mode | Default Model | Default Duration | Default Resolution | Notes |
|
||||
|------|--------------|-----------------|-------------------|-------|
|
||||
| t2v | MiniMax-Hailuo-2.3 | 10s | 768P | Latest text-to-video |
|
||||
| i2v | MiniMax-Hailuo-2.3 | 10s | 768P | Latest image-to-video |
|
||||
| sef | MiniMax-Hailuo-02 | 6s | 768P | Start-end frame |
|
||||
| ref | S2V-01 | 6s | 720P | Subject reference, 6s only |
|
||||
|
||||
## Media Tools (Audio/Video Processing)
|
||||
|
||||
Entry point: `scripts/media_tools.sh`
|
||||
|
||||
Standalone FFmpeg-based utilities for format conversion, concatenation, extraction, trimming, and audio overlay. Use these when the user needs to process existing media files without generating new content via MiniMax API.
|
||||
|
||||
### Video Format Conversion
|
||||
|
||||
```bash
|
||||
# Convert between formats (mp4, mov, webm, mkv, avi, ts, flv)
|
||||
bash scripts/media_tools.sh convert-video input.webm -o output.mp4
|
||||
bash scripts/media_tools.sh convert-video input.mp4 -o output.mov
|
||||
|
||||
# With quality / resolution / fps options
|
||||
bash scripts/media_tools.sh convert-video input.mp4 -o output.mp4 \
|
||||
--crf 18 --preset medium --resolution 1920x1080 --fps 30
|
||||
```
|
||||
|
||||
### Audio Format Conversion
|
||||
|
||||
```bash
|
||||
# Convert between formats (mp3, wav, flac, ogg, aac, m4a, opus, wma)
|
||||
bash scripts/media_tools.sh convert-audio input.wav -o output.mp3
|
||||
bash scripts/media_tools.sh convert-audio input.mp3 -o output.flac \
|
||||
--bitrate 320k --sample-rate 48000 --channels 2
|
||||
```
|
||||
|
||||
### Video Concatenation
|
||||
|
||||
```bash
|
||||
# Concatenate with crossfade transition (default 0.5s)
|
||||
bash scripts/media_tools.sh concat-video seg1.mp4 seg2.mp4 seg3.mp4 -o merged.mp4
|
||||
|
||||
# Hard cut (no crossfade)
|
||||
bash scripts/media_tools.sh concat-video seg1.mp4 seg2.mp4 -o merged.mp4 --crossfade 0
|
||||
```
|
||||
|
||||
### Audio Concatenation
|
||||
|
||||
```bash
|
||||
# Simple concatenation
|
||||
bash scripts/media_tools.sh concat-audio part1.mp3 part2.mp3 -o combined.mp3
|
||||
|
||||
# With crossfade
|
||||
bash scripts/media_tools.sh concat-audio part1.mp3 part2.mp3 -o combined.mp3 --crossfade 1
|
||||
```
|
||||
|
||||
### Extract Audio from Video
|
||||
|
||||
```bash
|
||||
# Extract as mp3
|
||||
bash scripts/media_tools.sh extract-audio video.mp4 -o audio.mp3
|
||||
|
||||
# Extract as wav with higher bitrate
|
||||
bash scripts/media_tools.sh extract-audio video.mp4 -o audio.wav --bitrate 320k
|
||||
```
|
||||
|
||||
### Video Trimming
|
||||
|
||||
```bash
|
||||
# Trim by start/end time (seconds)
|
||||
bash scripts/media_tools.sh trim-video input.mp4 -o clip.mp4 --start 5 --end 15
|
||||
|
||||
# Trim by start + duration
|
||||
bash scripts/media_tools.sh trim-video input.mp4 -o clip.mp4 --start 10 --duration 8
|
||||
```
|
||||
|
||||
### Add Audio to Video (Overlay / Replace)
|
||||
|
||||
```bash
|
||||
# Mix audio with existing video audio
|
||||
bash scripts/media_tools.sh add-audio --video video.mp4 --audio bgm.mp3 -o output.mp4 \
|
||||
--volume 0.3 --fade-in 2 --fade-out 3
|
||||
|
||||
# Replace original audio entirely
|
||||
bash scripts/media_tools.sh add-audio --video video.mp4 --audio narration.mp3 -o output.mp4 \
|
||||
--replace
|
||||
```
|
||||
|
||||
### Media File Info
|
||||
|
||||
```bash
|
||||
bash scripts/media_tools.sh probe input.mp4
|
||||
```
|
||||
|
||||
## Script Architecture
|
||||
|
||||
```
|
||||
scripts/
|
||||
├── check_environment.sh # Env verification (curl, ffmpeg, jq, xxd, API key)
|
||||
├── media_tools.sh # Audio/video conversion, concat, trim, extract
|
||||
├── tts/
|
||||
│ └── generate_voice.sh # Unified TTS CLI (tts, clone, design, list-voices, generate, merge, convert)
|
||||
├── music/
|
||||
│ └── generate_music.sh # Music generation CLI
|
||||
├── image/
|
||||
│ └── generate_image.sh # Image generation CLI (2 modes: t2i, i2i)
|
||||
└── video/
|
||||
├── generate_video.sh # Video generation CLI (4 modes: t2v, i2v, sef, ref)
|
||||
├── generate_long_video.sh # Multi-scene long video
|
||||
├── generate_template_video.sh # Template-based video
|
||||
└── add_bgm.sh # Background music overlay
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
Read these for detailed API parameters, voice catalogs, and prompt engineering:
|
||||
|
||||
- [tts-guide.md](references/tts-guide.md) — TTS setup, voice management, audio processing, segment format, troubleshooting
|
||||
- [tts-voice-catalog.md](references/tts-voice-catalog.md) — Full voice catalog with IDs, descriptions, and parameter reference
|
||||
- [music-api.md](references/music-api.md) — Music generation API: endpoints, parameters, response format
|
||||
- [image-api.md](references/image-api.md) — Image generation API: text-to-image, image-to-image, parameters
|
||||
- [video-api.md](references/video-api.md) — Video API: endpoints, models, parameters, camera instructions, templates
|
||||
- [video-prompt-guide.md](references/video-prompt-guide.md) — Video prompt engineering: formulas, styles, image-to-video tips
|
||||
115
skills/minimax-multimodal-toolkit/references/image-api.md
Normal file
115
skills/minimax-multimodal-toolkit/references/image-api.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# MiniMax Image Generation API (image-01)
|
||||
|
||||
Source: https://platform.minimaxi.com/docs/api-reference/image-generation-t2i and https://platform.minimaxi.com/docs/api-reference/image-generation-i2i
|
||||
|
||||
## Endpoint
|
||||
|
||||
`POST https://api.minimaxi.com/v1/image_generation`
|
||||
|
||||
## Auth
|
||||
|
||||
`Authorization: Bearer <MINIMAX_API_KEY>`
|
||||
|
||||
## Request (JSON)
|
||||
|
||||
Required:
|
||||
- `model`: string — `image-01`
|
||||
- `prompt`: string (max 1500 chars) — text description of the desired image
|
||||
|
||||
Optional:
|
||||
- `aspect_ratio`: string — image aspect ratio, default `1:1`. Options:
|
||||
- `1:1` (1024×1024)
|
||||
- `16:9` (1280×720)
|
||||
- `4:3` (1152×864)
|
||||
- `3:2` (1248×832)
|
||||
- `2:3` (832×1248)
|
||||
- `3:4` (864×1152)
|
||||
- `9:16` (720×1280)
|
||||
- `21:9` (1344×576)
|
||||
- `width`: integer — custom width in pixels. Range [512, 2048], must be multiple of 8. Overridden by `aspect_ratio` if both set.
|
||||
- `height`: integer — custom height in pixels. Same rules as `width`. Both `width` and `height` must be set together.
|
||||
- `response_format`: string — `url` (default, valid 24h) or `base64`
|
||||
- `n`: integer (1–9, default 1) — number of images to generate
|
||||
- `seed`: integer — random seed for reproducibility
|
||||
- `prompt_optimizer`: boolean (default `false`) — enable automatic prompt optimization
|
||||
- `aigc_watermark`: boolean (default `false`) — add AIGC watermark
|
||||
|
||||
### Subject Reference (image-to-image)
|
||||
|
||||
- `subject_reference`: array — character reference for image-to-image generation
|
||||
- `type`: string — currently only `character` (portrait)
|
||||
- `image_file`: string — reference image as public URL or Base64 Data URL (`data:image/jpeg;base64,...`). For best results, use a single person front-facing photo. Formats: JPG, JPEG, PNG. Max size: 10MB.
|
||||
|
||||
## Example — Text-to-Image
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "image-01",
|
||||
"prompt": "A man in a white t-shirt, full-body, standing front view, outdoors, with the Venice Beach sign in the background, Los Angeles. Fashion photography in 90s documentary style, film grain, photorealistic.",
|
||||
"aspect_ratio": "16:9",
|
||||
"response_format": "url",
|
||||
"n": 3,
|
||||
"prompt_optimizer": true
|
||||
}
|
||||
```
|
||||
|
||||
## Example — Image-to-Image (Character Reference)
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "image-01",
|
||||
"prompt": "A girl looking into the distance from a library window",
|
||||
"aspect_ratio": "16:9",
|
||||
"subject_reference": [
|
||||
{
|
||||
"type": "character",
|
||||
"image_file": "https://example.com/face.jpg"
|
||||
}
|
||||
],
|
||||
"n": 2
|
||||
}
|
||||
```
|
||||
|
||||
## Response
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "03ff3cd0820949eb8a410056b5f21d38",
|
||||
"data": {
|
||||
"image_urls": ["https://...", "https://...", "https://..."],
|
||||
"image_base64": null
|
||||
},
|
||||
"metadata": {
|
||||
"success_count": 3,
|
||||
"failed_count": 0
|
||||
},
|
||||
"base_resp": {
|
||||
"status_code": 0,
|
||||
"status_msg": "success"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `data.image_urls`: array of image URLs (when `response_format` is `url`, valid 24h)
|
||||
- `data.image_base64`: array of Base64 strings (when `response_format` is `base64`)
|
||||
- `metadata.success_count`: number of successfully generated images
|
||||
- `metadata.failed_count`: number of images blocked by content safety
|
||||
|
||||
## Status Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | Success |
|
||||
| 1002 | Rate limited, retry later |
|
||||
| 1004 | Auth failed, check API key |
|
||||
| 1008 | Insufficient balance |
|
||||
| 1026 | Prompt contains sensitive content |
|
||||
| 2013 | Invalid parameters |
|
||||
| 2049 | Invalid API key |
|
||||
|
||||
## Notes
|
||||
|
||||
- The API is synchronous — images are returned directly in the response (no polling needed).
|
||||
- URL format image links expire after 24 hours.
|
||||
- For image-to-image: upload a single front-facing portrait for best character reference results.
|
||||
- `width`/`height` are overridden by `aspect_ratio` if both provided.
|
||||
57
skills/minimax-multimodal-toolkit/references/music-api.md
Normal file
57
skills/minimax-multimodal-toolkit/references/music-api.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# MiniMax Music Generation API (music-2.5)
|
||||
|
||||
Source: https://platform.minimaxi.com/docs/api-reference/music-generation
|
||||
|
||||
## Endpoint
|
||||
|
||||
`POST https://api.minimaxi.com/v1/music_generation`
|
||||
|
||||
## Auth
|
||||
|
||||
`Authorization: Bearer <MINIMAX_API_KEY>`
|
||||
|
||||
## Request (JSON)
|
||||
|
||||
Required:
|
||||
- `model`: string — `music-2.5`
|
||||
- `lyrics`: string (1–3500 chars) — required. Use `\n` for line breaks. Structure tags: `[Verse]`, `[Chorus]`, `[Bridge]`, `[Intro]`, `[Outro]`, etc.
|
||||
|
||||
Optional:
|
||||
- `prompt`: string (0–2000 chars) — style description, optional but recommended.
|
||||
- `lyrics_optimizer`: boolean — auto-generate lyrics from prompt when lyrics is empty.
|
||||
- `stream`: boolean (default `false`)
|
||||
- `output_format`: `hex` (default) or `url`. URL valid for 24 hours.
|
||||
- `aigc_watermark`: boolean — top-level field, non-streaming only.
|
||||
- `audio_setting`:
|
||||
- `sample_rate`: 16000, 24000, 32000, 44100
|
||||
- `bitrate`: 32000, 64000, 128000, 256000
|
||||
- `format`: mp3, wav, pcm
|
||||
|
||||
## Example
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "music-2.5",
|
||||
"prompt": "indie folk, melancholic, introspective",
|
||||
"lyrics": "[verse]\n...\n[chorus]\n...",
|
||||
"aigc_watermark": false,
|
||||
"audio_setting": {
|
||||
"sample_rate": 44100,
|
||||
"bitrate": 256000,
|
||||
"format": "mp3"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Response
|
||||
|
||||
- `data.audio`: hex string or URL depending on `output_format`
|
||||
- `data.status`: 1 (generating), 2 (complete)
|
||||
- `extra_info`: duration, sample_rate, channels, bitrate, size
|
||||
- `base_resp.status_code`: 0 on success
|
||||
|
||||
## Notes
|
||||
|
||||
- `music-2.5` does not support `is_instrumental`. For instrumental music, use lyrics `[intro] [outro]` and add `pure music, no lyrics` to the prompt.
|
||||
- `prompt` is optional but recommended for better style control.
|
||||
- `stream=true` only supports `hex` output.
|
||||
111
skills/minimax-multimodal-toolkit/references/tts-guide.md
Normal file
111
skills/minimax-multimodal-toolkit/references/tts-guide.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# TTS Guide
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
cd skills/MiniMaxStudio
|
||||
pip install -r requirements.txt
|
||||
brew install ffmpeg # macOS (or: sudo apt install ffmpeg)
|
||||
export MINIMAX_API_KEY="your-api-key" # sk-api-xxx or sk-cp-xxx
|
||||
python scripts/check_environment.py
|
||||
```
|
||||
|
||||
## Quick Test
|
||||
|
||||
```bash
|
||||
python scripts/tts/generate_voice.py tts "Hello, this is a test." -o test.mp3
|
||||
```
|
||||
|
||||
## Voice Management
|
||||
|
||||
List available voices:
|
||||
|
||||
```bash
|
||||
python scripts/tts/generate_voice.py list-voices
|
||||
```
|
||||
|
||||
### Voice Cloning
|
||||
|
||||
Create a custom voice from an audio sample:
|
||||
|
||||
```bash
|
||||
python scripts/tts/generate_voice.py clone audio.mp3 --voice-id my-custom-voice
|
||||
|
||||
# With preview
|
||||
python scripts/tts/generate_voice.py clone audio.mp3 --voice-id my-voice --preview "Test text" --preview-output preview.mp3
|
||||
```
|
||||
|
||||
Requirements: 10s–5min duration, ≤20MB, mp3/wav/m4a format.
|
||||
|
||||
### Voice Design
|
||||
|
||||
Design a voice from a text description:
|
||||
|
||||
```bash
|
||||
python scripts/tts/generate_voice.py design "A warm, gentle female voice" --voice-id designed-voice
|
||||
```
|
||||
|
||||
Custom voices expire after 7 days if not used with TTS.
|
||||
|
||||
## Audio Processing
|
||||
|
||||
### Merge
|
||||
|
||||
```bash
|
||||
python scripts/tts/generate_voice.py merge file1.mp3 file2.mp3 -o combined.mp3
|
||||
python scripts/tts/generate_voice.py merge a.mp3 b.mp3 -o merged.mp3 --crossfade 300
|
||||
```
|
||||
|
||||
### Convert
|
||||
|
||||
```bash
|
||||
python scripts/tts/generate_voice.py convert input.wav -o output.mp3
|
||||
python scripts/tts/generate_voice.py convert input.wav -o output.mp3 --format mp3 --bitrate 192k --sample-rate 32000
|
||||
```
|
||||
|
||||
FFmpeg required. Supported formats: mp3, wav, flac, ogg, m4a, aac, wma, opus, pcm.
|
||||
|
||||
## Segment-Based TTS
|
||||
|
||||
For multi-voice, multi-emotion workflows using a `segments.json` file:
|
||||
|
||||
```bash
|
||||
# Validate
|
||||
python scripts/tts/generate_voice.py validate segments.json --verbose
|
||||
|
||||
# Generate
|
||||
python scripts/tts/generate_voice.py generate segments.json -o output.mp3 --crossfade 200
|
||||
```
|
||||
|
||||
### segments.json Format
|
||||
|
||||
```json
|
||||
[
|
||||
{ "text": "Hello!", "voice_id": "female-shaonv", "emotion": "" },
|
||||
{ "text": "How are you?", "voice_id": "male-qn-qingse", "emotion": "happy" }
|
||||
]
|
||||
```
|
||||
|
||||
- `text` (required): Text to synthesize
|
||||
- `voice_id` (required): Voice ID
|
||||
- `emotion` (optional): For speech-2.8 models, leave empty for auto-matching. Valid values: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Error | Solution |
|
||||
|-------|----------|
|
||||
| `MINIMAX_API_KEY is required` | `export MINIMAX_API_KEY="key"` |
|
||||
| `FFmpeg not installed` | `brew install ffmpeg` |
|
||||
| `Voice not found` | `python scripts/tts/generate_voice.py list-voices` |
|
||||
| `401 Unauthorized` | Check API key validity |
|
||||
| `429 Too Many Requests` | Add delays between requests |
|
||||
|
||||
## API Details
|
||||
|
||||
- **Endpoint**: `POST /v1/t2a_v2`
|
||||
- **Base URL**: `https://api.minimaxi.com`
|
||||
- **Auth**: `Authorization: Bearer {MINIMAX_API_KEY}`
|
||||
- **Models**: speech-2.8-hd (recommended), speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo
|
||||
- **Text limit**: 10,000 characters per request
|
||||
- **Pause marker**: `<#x#>` where x is seconds (0.01–99.99)
|
||||
- **Interjection tags** (speech-2.8 only): `(laughs)`, `(chuckle)`, `(coughs)`, `(sighs)`, `(breath)`, etc.
|
||||
@@ -0,0 +1,543 @@
|
||||
# TTS Voice Catalog
|
||||
|
||||
## Contents
|
||||
|
||||
- [Voice Selection Guide](#voice-selection-guide)
|
||||
- [System Voices by Language](#system-voices-by-language)
|
||||
- [Voice Parameters](#voice-parameters)
|
||||
- [Custom Voices](#custom-voices)
|
||||
|
||||
---
|
||||
|
||||
## Voice Selection Guide
|
||||
|
||||
### Decision Flow
|
||||
|
||||
```
|
||||
Content type?
|
||||
├── Narration / Audiobook → audiobook_female_1, audiobook_male_1
|
||||
├── News / Announcement → Chinese (Mandarin)_News_Anchor, Chinese (Mandarin)_Male_Announcer
|
||||
├── Documentary → doc_commentary
|
||||
└── Other → Select by: Gender → Age → Language → Personality
|
||||
```
|
||||
|
||||
### Recommended Professional Voices
|
||||
|
||||
| Scenario | Recommended | Characteristics |
|
||||
|----------|-------------|-----------------|
|
||||
| Narration / Audiobook | `audiobook_female_1`, `audiobook_male_1` | Clear articulation, good pacing, sustained performance |
|
||||
| News / Announcement | `Chinese (Mandarin)_News_Anchor`, `Chinese (Mandarin)_Male_Announcer` | Authoritative, professional pacing |
|
||||
| Documentary | `doc_commentary` | Professional, clear, consistent |
|
||||
|
||||
### Selection Priority
|
||||
|
||||
1. **Gender** (mandatory match) — male voices for male characters, female for female
|
||||
2. **Age** — Children / Youth / Adult / Elderly
|
||||
3. **Language** (must match content language)
|
||||
4. **Personality/tone** — choose best fit from matching candidates
|
||||
|
||||
---
|
||||
|
||||
## System Voices by Language
|
||||
|
||||
Gender: M = Male, F = Female, N = Neutral/Character
|
||||
Age: C = Child, Y = Youth, A = Adult, E = Elder
|
||||
|
||||
### Chinese Mandarin (普通话)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `male-qn-qingse` | 青涩青年 | M | Y | Youthful, inexperienced | Campus, coming-of-age |
|
||||
| `male-qn-badao` | 霸道青年 | M | Y | Arrogant, dominant | Drama, romance |
|
||||
| `male-qn-daxuesheng` | 青年大学生 | M | Y | University student | Campus, educational |
|
||||
| `male-qn-jingying` | 精英青年 | M | A | Elite, ambitious | Business, professional |
|
||||
| `female-shaonv` | 少女 | F | Y | Young maiden | Romance, youth |
|
||||
| `female-yujie` | 御姐 | F | A | Mature, elegant | Romance, professional |
|
||||
| `female-chengshu` | 成熟女性 | F | A | Mature, reliable | Sophisticated, news |
|
||||
| `female-tianmei` | 甜美女性 | F | A | Sweet, pleasant | Soft, gentle |
|
||||
| `clever_boy` | 聪明男童 | M | C | Smart, witty | Children's, educational |
|
||||
| `cute_boy` | 可爱男童 | M | C | Adorable | Kids, animations |
|
||||
| `lovely_girl` | 萌萌女童 | F | C | Cute, sweet | Children's stories |
|
||||
| `cartoon_pig` | 卡通猪小琪 | N | C | Cartoon character | Animations, comedy |
|
||||
| `bingjiao_didi` | 病娇弟弟 | M | Y | Tsundere brother | Romance, character |
|
||||
| `junlang_nanyou` | 俊朗男友 | M | Y | Handsome boyfriend | Romance, dating |
|
||||
| `chunzhen_xuedi` | 纯真学弟 | M | Y | Innocent junior | Campus, youth |
|
||||
| `lengdan_xiongzhang` | 冷淡学长 | M | Y | Cool senior | Campus, romance |
|
||||
| `badao_shaoye` | 霸道少爷 | M | A | Arrogant young master | Drama, character |
|
||||
| `tianxin_xiaoling` | 甜心小玲 | F | Y | Sweet Xiao Ling | Character, animations |
|
||||
| `qiaopi_mengmei` | 俏皮萌妹 | F | Y | Playful cute girl | Comedy, light-hearted |
|
||||
| `wumei_yujie` | 妩媚御姐 | F | A | Charming mature woman | Romance, mature |
|
||||
| `diadia_xuemei` | 嗲嗲学妹 | F | Y | Flirty junior girl | Romance, dating |
|
||||
| `danya_xuejie` | 淡雅学姐 | F | Y | Elegant senior girl | Campus, romance |
|
||||
| `Arrogant_Miss` | 嚣张小姐 | F | A | Arrogant young lady | Drama, character |
|
||||
| `Robot_Armor` | 机械战甲 | N | A | Robotic armor | Sci-fi, games |
|
||||
| `audiobook_male_1` | 有声书男1 | M | A | Warm, engaging narrator | Audiobooks, stories |
|
||||
| `audiobook_female_1` | 有声书女1 | F | A | Gentle, expressive narrator | Audiobooks, stories |
|
||||
| `doc_commentary` | 纪录片解说 | M | A | Professional narrator | Documentary |
|
||||
| `Chinese (Mandarin)_News_Anchor` | 新闻女声 | F | A | News anchor | News, broadcasts |
|
||||
| `Chinese (Mandarin)_Male_Announcer` | 播报男声 | M | A | Male announcer | Announcements |
|
||||
| `Chinese (Mandarin)_Radio_Host` | 电台男主播 | M | A | Radio host | Podcasts, radio |
|
||||
| `Chinese (Mandarin)_Reliable_Executive` | 沉稳高管 | M | A | Reliable executive | Corporate, business |
|
||||
| `Chinese (Mandarin)_Gentleman` | 温润男声 | M | A | Gentle, refined | Narration, storytelling |
|
||||
| `Chinese (Mandarin)_Unrestrained_Young_Man` | 不羁青年 | M | Y | Unrestrained, casual | Entertainment |
|
||||
| `Chinese (Mandarin)_Southern_Young_Man` | 南方小哥 | M | Y | Southern accent | Regional, casual |
|
||||
| `Chinese (Mandarin)_Gentle_Youth` | 温润青年 | M | Y | Gentle young man | Narration, calm |
|
||||
| `Chinese (Mandarin)_Sincere_Adult` | 真诚青年 | M | Y | Sincere, genuine | Honest, genuine |
|
||||
| `Chinese (Mandarin)_Straightforward_Boy` | 率真弟弟 | M | Y | Frank, direct | Casual, direct |
|
||||
| `Chinese (Mandarin)_Pure-hearted_Boy` | 清澈邻家弟弟 | M | Y | Pure-hearted neighbor | Innocent, wholesome |
|
||||
| `Chinese (Mandarin)_Stubborn_Friend` | 嘴硬竹马 | M | Y | Stubborn childhood friend | Drama, character |
|
||||
| `Chinese (Mandarin)_Lyrical_Voice` | 抒情男声 | M | A | Lyrical, singing | Music, singing |
|
||||
| `Chinese (Mandarin)_Mature_Woman` | 傲娇御姐 | F | A | Tsundere mature woman | Romance, character |
|
||||
| `Chinese (Mandarin)_Sweet_Lady` | 甜美女声 | F | A | Sweet lady | Soft, gentle |
|
||||
| `Chinese (Mandarin)_Warm_Bestie` | 温暖闺蜜 | F | A | Warm bestie | Friendly, supportive |
|
||||
| `Chinese (Mandarin)_Warm_Girl` | 温暖少女 | F | Y | Warm young girl | Friendly, supportive |
|
||||
| `Chinese (Mandarin)_Soft_Girl` | 柔和少女 | F | Y | Soft, gentle | Calm, soothing |
|
||||
| `Chinese (Mandarin)_Crisp_Girl` | 清脆少女 | F | Y | Crisp, clear | Bright, clear |
|
||||
| `Chinese (Mandarin)_Gentle_Senior` | 温柔学姐 | F | Y | Gentle senior girl | Campus, supportive |
|
||||
| `Chinese (Mandarin)_Wise_Women` | 阅历姐姐 | F | A | Experienced, wise | Advice, guidance |
|
||||
| `Chinese (Mandarin)_HK_Flight_Attendant` | 港普空姐 | F | A | HK accent flight attendant | Regional, entertainment |
|
||||
| `Chinese (Mandarin)_Cute_Spirit` | 憨憨萌兽 | N | C | Cute cartoon spirit | Animations, children's |
|
||||
| `Chinese (Mandarin)_Humorous_Elder` | 搞笑大爷 | M | E | Humorous old man | Comedy, entertainment |
|
||||
| `Chinese (Mandarin)_Kind-hearted_Elder` | 花甲奶奶 | F | E | Kind elderly lady | Stories, warm |
|
||||
| `Chinese (Mandarin)_Kind-hearted_Antie` | 热心大婶 | F | E | Kind-hearted auntie | Warm, friendly |
|
||||
|
||||
### Chinese Cantonese (粤语)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Cantonese_ProfessionalHost(F)` | 专业女主持 | F | A | Professional host | Broadcasts, hosting |
|
||||
| `Cantonese_GentleLady` | 温柔女声 | F | A | Gentle female | Soft, warm |
|
||||
| `Cantonese_ProfessionalHost(M)` | 专业男主持 | M | A | Professional host | Broadcasts, hosting |
|
||||
| `Cantonese_PlayfulMan` | 活泼男声 | M | A | Playful male | Entertainment, casual |
|
||||
| `Cantonese_CuteGirl` | 可爱女孩 | F | C | Cute girl | Children's, animations |
|
||||
| `Cantonese_KindWoman` | 善良女声 | F | A | Kind female | Warm, friendly |
|
||||
|
||||
### English
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `English_Trustworthy_Man` | Trustworthy Man | M | A | Reliable, sincere | Business, narration |
|
||||
| `English_Graceful_Lady` | Graceful Lady | F | A | Elegant, refined | Formal, professional |
|
||||
| `English_Aussie_Bloke` | Aussie Bloke | M | A | Casual Australian | Casual, entertainment |
|
||||
| `English_Whispering_girl` | Whispering Girl | F | Y | Soft whisper | Romance, intimate |
|
||||
| `English_Diligent_Man` | Diligent Man | M | A | Earnest, hardworking | Motivational, educational |
|
||||
| `English_Gentle-voiced_man` | Gentle-voiced Man | M | E | Soft-spoken, kind | Calm, supportive |
|
||||
| `English_Sweet_Girl` | Sweet Girl | F | C | Sweet, innocent | Children's, friendly |
|
||||
| `Charming_Lady` | Charming Lady | F | A | Elegant, sophisticated | Professional, romance |
|
||||
| `Attractive_Girl` | Attractive Girl | F | Y | Engaging female | Entertainment, marketing |
|
||||
| `Serene_Woman` | Serene Woman | F | A | Calm, peaceful | Meditation, relaxation |
|
||||
| `Santa_Claus` | Santa Claus | M | E | Festive, jolly | Holiday, children's |
|
||||
| `Charming_Santa` | Charming Santa | M | E | Smooth, charismatic | Holiday, entertainment |
|
||||
| `Grinch` | Grinch | M | A | Whiny, mischievous | Comedy, holiday |
|
||||
| `Rudolph` | Rudolph | N | C | Cute, nasal reindeer | Children's, holiday |
|
||||
| `Arnold` | Arnold | M | A | Deep, robotic | Sci-fi, action |
|
||||
| `Cute_Elf` | Cute Elf | N | C | Playful, tiny elf | Fantasy, children's |
|
||||
|
||||
### Japanese (日本語)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Japanese_IntellectualSenior` | Intellectual Senior | M | E | Wise, knowledgeable | Narration, educational |
|
||||
| `Japanese_DecisivePrincess` | Decisive Princess | F | A | Confident, royal | Animation, games |
|
||||
| `Japanese_LoyalKnight` | Loyal Knight | M | A | Brave, faithful | Fantasy, games |
|
||||
| `Japanese_DominantMan` | Dominant Man | M | A | Powerful, commanding | Action, leadership |
|
||||
| `Japanese_SeriousCommander` | Serious Commander | M | A | Stern, authoritative | Military, games |
|
||||
| `Japanese_ColdQueen` | Cold Queen | F | A | Distant, majestic | Drama, fantasy |
|
||||
| `Japanese_DependableWoman` | Dependable Woman | F | A | Reliable, supportive | Guidance |
|
||||
| `Japanese_GentleButler` | Gentle Butler | M | A | Polite, refined | Comedy, animation |
|
||||
| `Japanese_KindLady` | Kind Lady | F | A | Warm, gentle | Comforting |
|
||||
| `Japanese_CalmLady` | Calm Lady | F | A | Composed, serene | Meditation, relaxation |
|
||||
| `Japanese_OptimisticYouth` | Optimistic Youth | M | Y | Cheerful, positive | Youth, motivation |
|
||||
| `Japanese_GenerousIzakayaOwner` | Generous Izakaya Owner | M | A | Friendly, welcoming | Casual, comedy |
|
||||
| `Japanese_SportyStudent` | Sporty Student | M | Y | Energetic, athletic | Sports, youth |
|
||||
| `Japanese_InnocentBoy` | Innocent Boy | M | C | Pure, naive | Children's |
|
||||
| `Japanese_GracefulMaiden` | Graceful Maiden | F | Y | Elegant, gentle | Romance, drama |
|
||||
|
||||
### Korean (한국어)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Korean_SweetGirl` | Sweet Girl | F | C | Sweet, adorable | Children's, romance |
|
||||
| `Korean_CheerfulBoyfriend` | Cheerful Boyfriend | M | Y | Energetic, loving | Romance, dating |
|
||||
| `Korean_EnchantingSister` | Enchanting Sister | F | A | Charming, captivating | Family, drama |
|
||||
| `Korean_ShyGirl` | Shy Girl | F | Y | Timid, reserved | Comedy, romance |
|
||||
| `Korean_ReliableSister` | Reliable Sister | F | A | Trustworthy, dependable | Guidance |
|
||||
| `Korean_StrictBoss` | Strict Boss | M | A | Authoritative, demanding | Business, drama |
|
||||
| `Korean_SassyGirl` | Sassy Girl | F | Y | Bold, witty | Comedy, entertainment |
|
||||
| `Korean_ChildhoodFriendGirl` | Childhood Friend Girl | F | Y | Familiar, friendly | Romance, nostalgia |
|
||||
| `Korean_PlayboyCharmer` | Playboy Charmer | M | A | Smooth, flirtatious | Romance, entertainment |
|
||||
| `Korean_ElegantPrincess` | Elegant Princess | F | A | Graceful, royal | Animation, fantasy |
|
||||
| `Korean_BraveFemaleWarrior` | Brave Female Warrior | F | A | Courageous | Action, fantasy |
|
||||
| `Korean_BraveYouth` | Brave Youth | M | Y | Heroic | Action, youth |
|
||||
| `Korean_CalmLady` | Calm Lady | F | A | Composed, serene | Meditation, relaxation |
|
||||
| `Korean_EnthusiasticTeen` | Enthusiastic Teen | M | Y | Excited, energetic | Youth |
|
||||
| `Korean_SoothingLady` | Soothing Lady | F | A | Calming, comforting | Relaxation |
|
||||
| `Korean_IntellectualSenior` | Intellectual Senior | M | E | Wise, knowledgeable | Educational, narration |
|
||||
| `Korean_LonelyWarrior` | Lonely Warrior | M | A | Solitary, melancholic | Drama, fantasy |
|
||||
| `Korean_MatureLady` | Mature Lady | F | A | Sophisticated | Professional, drama |
|
||||
| `Korean_InnocentBoy` | Innocent Boy | M | C | Pure, naive | Children's |
|
||||
| `Korean_CharmingSister` | Charming Sister | F | A | Attractive, delightful | Family, romance |
|
||||
| `Korean_AthleticStudent` | Athletic Student | M | Y | Sporty, energetic | Sports, youth |
|
||||
| `Korean_BraveAdventurer` | Brave Adventurer | M | A | Courageous explorer | Adventure, fantasy |
|
||||
| `Korean_CalmGentleman` | Calm Gentleman | M | A | Composed, refined | Formal, professional |
|
||||
| `Korean_WiseElf` | Wise Elf | M | E | Ancient, mystical | Fantasy, narration |
|
||||
| `Korean_CheerfulCoolJunior` | Cheerful Cool Junior | M | Y | Popular, friendly | Youth, entertainment |
|
||||
| `Korean_DecisiveQueen` | Decisive Queen | F | A | Commanding | Drama, fantasy |
|
||||
| `Korean_ColdYoungMan` | Cold Young Man | M | Y | Distant, aloof | Drama, romance |
|
||||
| `Korean_MysteriousGirl` | Mysterious Girl | F | Y | Enigmatic, secretive | Mystery, drama |
|
||||
| `Korean_QuirkyGirl` | Quirky Girl | F | Y | Eccentric, unique | Comedy |
|
||||
| `Korean_ConsiderateSenior` | Considerate Senior | M | E | Thoughtful, caring | Warm, supportive |
|
||||
| `Korean_CheerfulLittleSister` | Cheerful Little Sister | F | C | Playful, adorable | Family, comedy |
|
||||
| `Korean_DominantMan` | Dominant Man | M | A | Powerful, commanding | Leadership, action |
|
||||
| `Korean_AirheadedGirl` | Airheaded Girl | F | Y | Bubbly, spacey | Comedy |
|
||||
| `Korean_ReliableYouth` | Reliable Youth | M | Y | Trustworthy, dependable | Supportive |
|
||||
| `Korean_FriendlyBigSister` | Friendly Big Sister | F | A | Warm, protective | Family, support |
|
||||
| `Korean_GentleBoss` | Gentle Boss | M | A | Kind, understanding | Business |
|
||||
| `Korean_ColdGirl` | Cold Girl | F | Y | Aloof, distant | Drama, romance |
|
||||
| `Korean_HaughtyLady` | Haughty Lady | F | A | Arrogant, proud | Drama, comedy |
|
||||
| `Korean_CharmingElderSister` | Charming Elder Sister | F | A | Graceful | Romance, family |
|
||||
| `Korean_IntellectualMan` | Intellectual Man | M | A | Smart, knowledgeable | Educational |
|
||||
| `Korean_CaringWoman` | Caring Woman | F | A | Nurturing | Supportive, warm |
|
||||
| `Korean_WiseTeacher` | Wise Teacher | M | E | Experienced | Educational |
|
||||
| `Korean_ConfidentBoss` | Confident Boss | M | A | Self-assured, capable | Business, leadership |
|
||||
| `Korean_AthleticGirl` | Athletic Girl | F | Y | Sporty, energetic | Sports, fitness |
|
||||
| `Korean_PossessiveMan` | Possessive Man | M | A | Intense, protective | Romance, drama |
|
||||
| `Korean_GentleWoman` | Gentle Woman | F | A | Soft-spoken, kind | Calm |
|
||||
| `Korean_CockyGuy` | Cocky Guy | M | Y | Confident, arrogant | Comedy |
|
||||
| `Korean_ThoughtfulWoman` | Thoughtful Woman | F | A | Reflective, caring | Drama |
|
||||
| `Korean_OptimisticYouth` | Optimistic Youth | M | Y | Positive, hopeful | Motivation |
|
||||
|
||||
### Spanish (Español)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Spanish_Narrator` | Narrator | M | A | Professional narrator | Documentaries |
|
||||
| `Spanish_CaptivatingStoryteller` | Captivating Storyteller | M | A | Engaging narrator | Audiobooks |
|
||||
| `Spanish_WiseScholar` | Wise Scholar | M | A | Knowledgeable | Educational |
|
||||
| `Spanish_SereneWoman` | Serene Woman | F | A | Calm, peaceful | Relaxation |
|
||||
| `Spanish_MaturePartner` | Mature Partner | M | A | Sophisticated | Romance, drama |
|
||||
| `Spanish_ConfidentWoman` | Confident Woman | F | A | Self-assured | Professional |
|
||||
| `Spanish_DeterminedManager` | Determined Manager | M | A | Ambitious, driven | Business |
|
||||
| `Spanish_BossyLeader` | Bossy Leader | M | A | Commanding | Leadership |
|
||||
| `Spanish_ReservedYoungMan` | Reserved Young Man | M | Y | Quiet, introverted | Drama |
|
||||
| `Spanish_ThoughtfulMan` | Thoughtful Man | M | A | Reflective | Educational |
|
||||
| `Spanish_RationalMan` | Rational Man | M | A | Logical, analytical | Business |
|
||||
| `Spanish_Deep-tonedMan` | Deep-toned Man | M | A | Deep, resonant | Commanding |
|
||||
| `Spanish_Jovialman` | Jovial Man | M | A | Cheerful, friendly | Entertainment |
|
||||
| `Spanish_Steadymentor` | Steady Mentor | M | A | Reliable mentor | Guidance |
|
||||
| `Spanish_ReliableMan` | Reliable Man | M | A | Trustworthy | Professional |
|
||||
| `Spanish_RomanticHusband` | Romantic Husband | M | A | Loving, romantic | Romance |
|
||||
| `Spanish_Comedian` | Comedian | M | A | Humorous | Comedy |
|
||||
| `Spanish_Debator` | Debator | M | A | Persuasive | Debate |
|
||||
| `Spanish_ToughBoss` | Tough Boss | M | A | Harsh, demanding | Business, drama |
|
||||
| `Spanish_AngryMan` | Angry Man | M | A | Frustrated | Drama, comedy |
|
||||
| `Spanish_PowerfulSoldier` | Powerful Soldier | M | A | Strong, brave | Action, military |
|
||||
| `Spanish_PassionateWarrior` | Passionate Warrior | M | A | Fierce, dedicated | Action, fantasy |
|
||||
| `Spanish_PowerfulVeteran` | Powerful Veteran | M | A | Experienced | Military |
|
||||
| `Spanish_SensibleManager` | Sensible Manager | M | A | Practical | Business |
|
||||
| `Spanish_Kind-heartedGirl` | Kind-hearted Girl | F | C | Warm, compassionate | Children's |
|
||||
| `Spanish_SophisticatedLady` | Sophisticated Lady | F | A | Elegant, refined | Formal |
|
||||
| `Spanish_FrankLady` | Frank Lady | F | A | Direct, honest | Comedy |
|
||||
| `Spanish_Fussyhostess` | Fussy Hostess | F | A | Demanding | Comedy, drama |
|
||||
| `Spanish_Wiselady` | Wise Lady | F | E | Experienced, wise | Guidance |
|
||||
| `Spanish_ThoughtfulLady` | Thoughtful Lady | F | A | Considerate | Advice |
|
||||
| `Spanish_AssertiveQueen` | Assertive Queen | F | A | Commanding | Drama, fantasy |
|
||||
| `Spanish_CaringGirlfriend` | Caring Girlfriend | F | Y | Nurturing | Romance |
|
||||
| `Spanish_ChattyGirl` | Chatty Girl | F | Y | Talkative, sociable | Comedy |
|
||||
| `Spanish_CompellingGirl` | Compelling Girl | F | Y | Persuasive | Marketing |
|
||||
| `Spanish_WhimsicalGirl` | Whimsical Girl | F | C | Playful, imaginative | Children's |
|
||||
| `Spanish_Intonategirl` | Intonate Girl | F | Y | Musical, melodic | Singing |
|
||||
| `Spanish_SincereTeen` | Sincere Teen | M | Y | Honest, genuine | Youth |
|
||||
| `Spanish_Strong-WilledBoy` | Strong-willed Boy | M | Y | Determined | Youth, motivation |
|
||||
| `Spanish_EnergeticBoy` | Energetic Boy | M | C | Active, lively | Youth, sports |
|
||||
| `Spanish_StrictBoss` | Strict Boss | M | A | Strict | Business |
|
||||
| `Spanish_HumorousElder` | Humorous Elder | M | E | Funny | Comedy |
|
||||
| `Spanish_SereneElder` | Serene Elder | M | E | Calm, peaceful | Meditation |
|
||||
| `Spanish_SantaClaus` | Santa Claus | M | E | Festive | Holiday |
|
||||
| `Spanish_Rudolph` | Rudolph | N | C | Reindeer | Holiday |
|
||||
| `Spanish_Arnold` | Arnold | M | A | Robotic | Sci-fi |
|
||||
| `Spanish_Ghost` | Ghost | N | A | Spooky | Horror |
|
||||
| `Spanish_AnimeCharacter` | Anime Character | N | Y | Anime-style | Animation |
|
||||
|
||||
### Portuguese (Português)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Portuguese_Narrator` | Narrator | M | A | Professional narrator | Documentaries |
|
||||
| `Portuguese_CaptivatingStoryteller` | Captivating Storyteller | M | A | Engaging narrator | Audiobooks |
|
||||
| `Portuguese_WiseScholar` | Wise Scholar | M | A | Knowledgeable | Educational |
|
||||
| `Portuguese_Deep-VoicedGentleman` | Deep-voiced Gentleman | M | A | Deep, rich | Commanding |
|
||||
| `Portuguese_ReservedYoungMan` | Reserved Young Man | M | Y | Quiet, introverted | Drama |
|
||||
| `Portuguese_ThoughtfulMan` | Thoughtful Man | M | A | Reflective | Educational |
|
||||
| `Portuguese_RationalMan` | Rational Man | M | A | Logical | Business |
|
||||
| `Portuguese_Jovialman` | Jovial Man | M | A | Cheerful | Entertainment |
|
||||
| `Portuguese_Steadymentor` | Steady Mentor | M | A | Reliable mentor | Guidance |
|
||||
| `Portuguese_ReliableMan` | Reliable Man | M | A | Trustworthy | Professional |
|
||||
| `Portuguese_RomanticHusband` | Romantic Husband | M | A | Loving | Romance |
|
||||
| `Portuguese_Comedian` | Comedian | M | A | Humorous | Comedy |
|
||||
| `Portuguese_Debator` | Debator | M | A | Persuasive | Debate |
|
||||
| `Portuguese_ToughBoss` | Tough Boss | M | A | Demanding | Business |
|
||||
| `Portuguese_StrictBoss` | Strict Boss | M | A | Strict | Business |
|
||||
| `Portuguese_AngryMan` | Angry Man | M | A | Frustrated | Drama |
|
||||
| `Portuguese_Godfather` | Godfather | M | A | Authoritative | Drama |
|
||||
| `Portuguese_PowerfulSoldier` | Powerful Soldier | M | A | Strong, brave | Action |
|
||||
| `Portuguese_PowerfulVeteran` | Powerful Veteran | M | A | Experienced | Military |
|
||||
| `Portuguese_SensibleManager` | Sensible Manager | M | A | Practical | Business |
|
||||
| `Portuguese_DeterminedManager` | Determined Manager | M | A | Driven | Business |
|
||||
| `Portuguese_BossyLeader` | Bossy Leader | M | A | Commanding | Leadership |
|
||||
| `Portuguese_CalmLeader` | Calm Leader | M | A | Composed, steady | Leadership |
|
||||
| `Portuguese_FascinatingBoy` | Fascinating Boy | M | Y | Charming | Romance |
|
||||
| `Portuguese_Strong-WilledBoy` | Strong-willed Boy | M | Y | Determined | Youth |
|
||||
| `Portuguese_EnergeticBoy` | Energetic Boy | M | C | Active, lively | Youth |
|
||||
| `Portuguese_FragileBoy` | Fragile Boy | M | Y | Sensitive | Drama |
|
||||
| `Portuguese_MaturePartner` | Mature Partner | M | A | Sophisticated | Romance |
|
||||
| `Portuguese_HumorousElder` | Humorous Elder | M | E | Funny | Comedy |
|
||||
| `Portuguese_SereneElder` | Serene Elder | M | E | Calm | Meditation |
|
||||
| `Portuguese_ConfidentWoman` | Confident Woman | F | A | Self-assured | Professional |
|
||||
| `Portuguese_SereneWoman` | Serene Woman | F | A | Calm, peaceful | Relaxation |
|
||||
| `Portuguese_SentimentalLady` | Sentimental Lady | F | A | Emotional | Drama, romance |
|
||||
| `Portuguese_Wiselady` | Wise Lady | F | E | Wise | Guidance |
|
||||
| `Portuguese_GorgeousLady` | Gorgeous Lady | F | A | Beautiful | Romance |
|
||||
| `Portuguese_LovelyLady` | Lovely Lady | F | A | Sweet, endearing | Warm |
|
||||
| `Portuguese_Pompouslady` | Pompous Lady | F | A | Self-important | Comedy |
|
||||
| `Portuguese_CharmingQueen` | Charming Queen | F | A | Elegant | Drama, fantasy |
|
||||
| `Portuguese_AssertiveQueen` | Assertive Queen | F | A | Commanding | Drama, fantasy |
|
||||
| `Portuguese_CharmingLady` | Charming Lady | F | A | Sophisticated | Professional |
|
||||
| `Portuguese_InspiringLady` | Inspiring Lady | F | A | Motivating | Motivation |
|
||||
| `Portuguese_StressedLady` | Stressed Lady | F | A | Anxious | Comedy |
|
||||
| `Portuguese_FrankLady` | Frank Lady | F | A | Direct, honest | Comedy |
|
||||
| `Portuguese_Fussyhostess` | Fussy Hostess | F | A | Demanding | Comedy |
|
||||
| `Portuguese_ThoughtfulLady` | Thoughtful Lady | F | A | Considerate | Advice |
|
||||
| `Portuguese_GentleTeacher` | Gentle Teacher | F | A | Kind, patient | Educational |
|
||||
| `Portuguese_Kind-heartedGirl` | Kind-hearted Girl | F | C | Warm | Children's |
|
||||
| `Portuguese_SweetGirl` | Sweet Girl | F | Y | Sweet, adorable | Romance |
|
||||
| `Portuguese_AttractiveGirl` | Attractive Girl | F | Y | Charming | Entertainment |
|
||||
| `Portuguese_PlayfulGirl` | Playful Girl | F | Y | Fun-loving | Comedy |
|
||||
| `Portuguese_SmartYoungGirl` | Smart Young Girl | F | Y | Intelligent | Educational |
|
||||
| `Portuguese_UpsetGirl` | Upset Girl | F | Y | Distressed | Drama |
|
||||
| `Portuguese_ElegantGirl` | Elegant Girl | F | Y | Graceful | Formal |
|
||||
| `Portuguese_CompellingGirl` | Compelling Girl | F | Y | Persuasive | Marketing |
|
||||
| `Portuguese_WhimsicalGirl` | Whimsical Girl | F | C | Playful | Children's |
|
||||
| `Portuguese_ChattyGirl` | Chatty Girl | F | Y | Talkative | Comedy |
|
||||
| `Portuguese_NaughtySchoolgirl` | Naughty Schoolgirl | F | Y | Mischievous | Comedy |
|
||||
| `Portuguese_SadTeen` | Sad Teen | F | Y | Melancholic | Drama |
|
||||
| `Portuguese_CaringGirlfriend` | Caring Girlfriend | F | Y | Nurturing | Romance |
|
||||
| `Portuguese_FriendlyNeighbor` | Friendly Neighbor | F | A | Warm, helpful | Community |
|
||||
| `Portuguese_Dramatist` | Dramatist | M | A | Theatrical | Drama |
|
||||
| `Portuguese_TheatricalActor` | Theatrical Actor | M | A | Dramatic | Entertainment |
|
||||
| `Portuguese_Conscientiousinstructor` | Conscientious Instructor | M | A | Diligent | Training |
|
||||
| `Portuguese_PlayfulSpirit` | Playful Spirit | N | C | Cheerful spirit | Fantasy |
|
||||
| `Portuguese_SantaClaus` | Santa Claus | M | E | Festive | Holiday |
|
||||
| `Portuguese_Rudolph` | Rudolph | N | C | Reindeer | Holiday |
|
||||
| `Portuguese_Arnold` | Arnold | M | A | Robotic | Sci-fi |
|
||||
| `Portuguese_CharmingSanta` | Charming Santa | M | E | Charismatic | Holiday |
|
||||
| `Portuguese_Grinch` | Grinch | M | A | Mischievous | Comedy |
|
||||
| `Portuguese_Ghost` | Ghost | N | A | Spooky | Horror |
|
||||
| `Portuguese_GrimReaper` | Grim Reaper | N | A | Dark, ominous | Horror |
|
||||
|
||||
### French (Français)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `French_Male_Speech_New` | Level-Headed Man | M | A | Calm, reasonable | Professional |
|
||||
| `French_Female_News Anchor` | Patient Female Presenter | F | A | Clear, patient | News |
|
||||
| `French_CasualMan` | Casual Man | M | A | Relaxed, informal | Casual |
|
||||
| `French_MovieLeadFemale` | Movie Lead Female | F | A | Dramatic, expressive | Drama |
|
||||
| `French_FemaleAnchor` | Female Anchor | F | A | Professional anchor | News |
|
||||
|
||||
### Indonesian (Bahasa Indonesia)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Indonesian_SweetGirl` | Sweet Girl | F | C | Sweet, adorable | Children's |
|
||||
| `Indonesian_ReservedYoungMan` | Reserved Young Man | M | Y | Quiet, introverted | Drama |
|
||||
| `Indonesian_CharmingGirl` | Charming Girl | F | Y | Attractive | Romance |
|
||||
| `Indonesian_CalmWoman` | Calm Woman | F | A | Composed, peaceful | Relaxation |
|
||||
| `Indonesian_ConfidentWoman` | Confident Woman | F | A | Self-assured | Professional |
|
||||
| `Indonesian_CaringMan` | Caring Man | M | A | Nurturing | Family |
|
||||
| `Indonesian_BossyLeader` | Bossy Leader | M | A | Commanding | Leadership |
|
||||
| `Indonesian_DeterminedBoy` | Determined Boy | M | Y | Ambitious | Youth |
|
||||
| `Indonesian_GentleGirl` | Gentle Girl | F | Y | Soft-spoken | Calm |
|
||||
|
||||
### German (Deutsch)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `German_FriendlyMan` | Friendly Man | M | A | Warm, approachable | Casual |
|
||||
| `German_SweetLady` | Sweet Lady | F | A | Pleasant, kind | Warm |
|
||||
| `German_PlayfulMan` | Playful Man | M | A | Fun-loving | Comedy |
|
||||
|
||||
### Russian (Русский)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Russian_HandsomeChildhoodFriend` | Handsome Childhood Friend | M | Y | Charming | Romance |
|
||||
| `Russian_BrightHeroine` | Bright Queen | F | A | Lively, strong | Drama |
|
||||
| `Russian_AmbitiousWoman` | Ambitious Woman | F | A | Driven | Professional |
|
||||
| `Russian_ReliableMan` | Reliable Man | M | A | Trustworthy | Professional |
|
||||
| `Russian_CrazyQueen` | Crazy Girl | F | Y | Wild, unpredictable | Comedy |
|
||||
| `Russian_PessimisticGirl` | Pessimistic Girl | F | Y | Gloomy | Comedy |
|
||||
| `Russian_AttractiveGuy` | Attractive Guy | M | A | Charming | Romance |
|
||||
| `Russian_Bad-temperedBoy` | Bad-tempered Boy | M | Y | Irritable, grumpy | Comedy |
|
||||
|
||||
### Italian (Italiano)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Italian_BraveHeroine` | Brave Heroine | F | A | Courageous | Action |
|
||||
| `Italian_Narrator` | Narrator | M | A | Professional narrator | Storytelling |
|
||||
| `Italian_WanderingSorcerer` | Wandering Sorcerer | M | A | Mysterious | Fantasy |
|
||||
| `Italian_DiligentLeader` | Diligent Leader | M | A | Hardworking | Leadership |
|
||||
|
||||
### Arabic (العربية)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Arabic_CalmWoman` | Calm Woman | F | A | Composed | Relaxation |
|
||||
| `Arabic_FriendlyGuy` | Friendly Guy | M | A | Warm | Casual |
|
||||
|
||||
### Turkish (Türkçe)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Turkish_CalmWoman` | Calm Woman | F | A | Composed | Relaxation |
|
||||
| `Turkish_Trustworthyman` | Trustworthy Man | M | A | Reliable | Professional |
|
||||
|
||||
### Ukrainian (Українська)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Ukrainian_CalmWoman` | Calm Woman | F | A | Composed | Relaxation |
|
||||
| `Ukrainian_WiseScholar` | Wise Scholar | M | A | Knowledgeable | Educational |
|
||||
|
||||
### Dutch (Nederlands)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Dutch_kindhearted_girl` | Kind-hearted Girl | F | C | Warm | Children's |
|
||||
| `Dutch_bossy_leader` | Bossy Leader | M | A | Commanding | Leadership |
|
||||
|
||||
### Vietnamese (Tiếng Việt)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Vietnamese_kindhearted_girl` | Kind-hearted Girl | F | C | Warm | Children's |
|
||||
|
||||
### Thai (ภาษาไทย)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Thai_male_1_sample8` | Serene Man | M | A | Calm, peaceful | Relaxation |
|
||||
| `Thai_male_2_sample2` | Friendly Man | M | A | Warm | Casual |
|
||||
| `Thai_female_1_sample1` | Confident Woman | F | A | Self-assured | Professional |
|
||||
| `Thai_female_2_sample2` | Energetic Woman | F | A | Active, lively | Motivation |
|
||||
|
||||
### Polish (Polski)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Polish_male_1_sample4` | Male Narrator | M | A | Professional | Narration |
|
||||
| `Polish_male_2_sample3` | Male Anchor | M | A | Professional | News |
|
||||
| `Polish_female_1_sample1` | Calm Woman | F | A | Composed | Relaxation |
|
||||
| `Polish_female_2_sample3` | Casual Woman | F | A | Relaxed | Casual |
|
||||
|
||||
### Romanian (Română)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `Romanian_male_1_sample2` | Reliable Man | M | A | Trustworthy | Professional |
|
||||
| `Romanian_male_2_sample1` | Energetic Youth | M | Y | Active, lively | Youth |
|
||||
| `Romanian_female_1_sample4` | Optimistic Youth | F | Y | Positive | Motivation |
|
||||
| `Romanian_female_2_sample1` | Gentle Woman | F | A | Soft-spoken | Calm |
|
||||
|
||||
### Greek (Ελληνικά)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `greek_male_1a_v1` | Thoughtful Mentor | M | A | Reflective, wise | Guidance |
|
||||
| `Greek_female_1_sample1` | Gentle Lady | F | A | Soft-spoken | Calm |
|
||||
| `Greek_female_2_sample3` | Girl Next Door | F | Y | Friendly | Casual |
|
||||
|
||||
### Czech (Čeština)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `czech_male_1_v1` | Assured Presenter | M | A | Confident | Presentations |
|
||||
| `czech_female_5_v7` | Steadfast Narrator | F | A | Reliable | Storytelling |
|
||||
| `czech_female_2_v2` | Elegant Lady | F | A | Graceful | Formal |
|
||||
|
||||
### Finnish (Suomi)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `finnish_male_3_v1` | Upbeat Man | M | A | Cheerful | Motivation |
|
||||
| `finnish_male_1_v2` | Friendly Boy | M | Y | Warm | Children's |
|
||||
| `finnish_female_4_v1` | Assertive Woman | F | A | Confident | Professional |
|
||||
|
||||
### Hindi (हिन्दी)
|
||||
|
||||
| voice_id | Name | G | Age | Description | Best For |
|
||||
|----------|------|---|-----|-------------|----------|
|
||||
| `hindi_male_1_v2` | Trustworthy Advisor | M | A | Reliable, wise | Guidance |
|
||||
| `hindi_female_2_v1` | Tranquil Woman | F | A | Calm, peaceful | Meditation |
|
||||
| `hindi_female_1_v2` | News Anchor | F | A | Professional | News |
|
||||
|
||||
---
|
||||
|
||||
## Voice Parameters
|
||||
|
||||
### VoiceSetting
|
||||
|
||||
```python
|
||||
from scripts.tts.utils import VoiceSetting
|
||||
|
||||
voice = VoiceSetting(
|
||||
voice_id="male-qn-qingse",
|
||||
speed=1.0, # 0.5–2.0 (default 1.0)
|
||||
volume=1.0, # 0.1–10.0 (default 1.0)
|
||||
pitch=0, # -12 to +12 (default 0)
|
||||
emotion="", # Leave empty for speech-2.8 auto-matching (recommended)
|
||||
)
|
||||
```
|
||||
|
||||
### Speed
|
||||
|
||||
| Value | Effect |
|
||||
|-------|--------|
|
||||
| 0.75 | Slower, deliberate (news, tutorials) |
|
||||
| 1.0 | Normal pace |
|
||||
| 1.25 | Slightly faster (energetic) |
|
||||
| 1.5+ | Fast (time-sensitive) |
|
||||
|
||||
### Emotion
|
||||
|
||||
| Value | Description | Model Support |
|
||||
|-------|-------------|---------------|
|
||||
| *(empty)* | Auto-match from text | speech-2.8 (recommended) |
|
||||
| `happy` | Cheerful, upbeat | All |
|
||||
| `sad` | Melancholic, somber | All |
|
||||
| `angry` | Intense, frustrated | All |
|
||||
| `fearful` | Anxious, nervous | All |
|
||||
| `disgusted` | Repulsed | All |
|
||||
| `surprised` | Astonished | All |
|
||||
| `calm` | Neutral tone | All |
|
||||
| `fluent` | Natural, lively | speech-2.6 only |
|
||||
| `whisper` | Soft, gentle | speech-2.6 only |
|
||||
|
||||
---
|
||||
|
||||
## Custom Voices
|
||||
|
||||
### Voice Cloning
|
||||
|
||||
Create custom voices from audio samples:
|
||||
- Source: 10s–5min, mp3/wav/m4a, ≤20MB, clear single speaker
|
||||
- Best: 30–60s of clean speech with varied intonation
|
||||
|
||||
### Voice Design
|
||||
|
||||
Generate voices from text descriptions:
|
||||
- Include: gender, age, vocal characteristics, tone, use case
|
||||
- Example: "A warm, grandmotherly voice with gentle pacing, perfect for bedtime stories"
|
||||
|
||||
Custom voices expire after 7 days if not used with TTS. List all voices: `python scripts/tts/generate_voice.py list-voices`
|
||||
130
skills/minimax-multimodal-toolkit/references/video-api.md
Normal file
130
skills/minimax-multimodal-toolkit/references/video-api.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# MiniMax Video Generation API Documentation
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/v1/video_generation` | POST | Create video generation task (all 4 modes) |
|
||||
| `/v1/query/video_generation` | GET | Query task status |
|
||||
| `/v1/files/retrieve` | GET | Get video download URL |
|
||||
| `/v1/video_template_generation` | POST | Create template-based video task |
|
||||
| `/v1/query/video_template_generation` | GET | Query template task status |
|
||||
|
||||
**Base URL:** `https://api.minimaxi.com`
|
||||
**Auth:** `Authorization: Bearer {MINIMAX_API_KEY}`
|
||||
|
||||
---
|
||||
|
||||
## Video Generation Models
|
||||
|
||||
### Text-to-Video (T2V) Models
|
||||
| Model | Resolution | Duration | Notes |
|
||||
|-------|-----------|----------|-------|
|
||||
| MiniMax-Hailuo-2.3 | 768P (default), 1080P | 6s (1080P), 6/10s (768P) | Recommended, latest |
|
||||
| MiniMax-Hailuo-2.3-Fast | 768P (default), 1080P | 6s (1080P), 6/10s (768P) | Fast variant |
|
||||
| MiniMax-Hailuo-02 | 512P, 768P (default), 1080P | 6s (1080P), 6/10s (512P/768P) | Previous gen |
|
||||
| T2V-01-Director | 720P | 6s | Director control |
|
||||
| T2V-01 | 720P | 6s | Base model |
|
||||
|
||||
### Image-to-Video (I2V) Models
|
||||
| Model | Resolution | Duration | Notes |
|
||||
|-------|-----------|----------|-------|
|
||||
| MiniMax-Hailuo-2.3 | 768P, 1080P | 6s | Recommended |
|
||||
| MiniMax-Hailuo-2.3-Fast | 768P, 1080P | 6s | Fast variant |
|
||||
| MiniMax-Hailuo-02 | 512P, 768P, 1080P | 6/10s | Previous gen |
|
||||
| I2V-01-Director | 720P | 6s | Director control |
|
||||
| I2V-01-live | 720P | 6s | Live photo style |
|
||||
| I2V-01 | 720P | 6s | Base model |
|
||||
|
||||
### Start-End Frame Model
|
||||
| Model | Notes |
|
||||
|-------|-------|
|
||||
| MiniMax-Hailuo-02 | Only model supporting start-end frame |
|
||||
|
||||
### Subject Reference Model
|
||||
| Model | Notes |
|
||||
|-------|-------|
|
||||
| S2V-01 | Face consistency across video |
|
||||
|
||||
---
|
||||
|
||||
## Request Parameters
|
||||
|
||||
### Common Parameters (All Modes)
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| model | string | Yes | - | Model name |
|
||||
| prompt | string | Depends | - | Video description, max 2000 chars |
|
||||
| duration | int | No | 6 | Video length in seconds |
|
||||
| resolution | string | No | 768P/720P | Video resolution |
|
||||
| prompt_optimizer | bool | No | true | Auto-optimize prompt |
|
||||
| fast_pretreatment | bool | No | false | Shorten optimizer duration |
|
||||
| callback_url | string | No | - | Webhook URL |
|
||||
| aigc_watermark | bool | No | false | Add watermark |
|
||||
|
||||
### Image-to-Video Parameters
|
||||
| Parameter | Type | Required | Description |
|
||||
|-----------|------|----------|-------------|
|
||||
| first_frame_image | string | Yes | Starting frame (URL or base64 data URL) |
|
||||
|
||||
**Image requirements:** JPG/JPEG/PNG/WebP, < 20MB, short side > 300px, aspect ratio 2:5–5:2.
|
||||
|
||||
### Start-End Frame Parameters
|
||||
| Parameter | Type | Required | Description |
|
||||
|-----------|------|----------|-------------|
|
||||
| first_frame_image | string | Yes | Starting frame |
|
||||
| last_frame_image | string | Yes | Ending frame |
|
||||
|
||||
### Subject Reference Parameters
|
||||
| Parameter | Type | Required | Description |
|
||||
|-----------|------|----------|-------------|
|
||||
| subject_reference | array | Yes | Array of subject objects |
|
||||
|
||||
Each object has `type` and `image` (array of image URLs):
|
||||
```json
|
||||
[{ "type": "character", "image": ["<image_url>"] }]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Camera Instructions
|
||||
|
||||
Supported in `[指令]` syntax for Hailuo-2.3, Hailuo-02, and Director models:
|
||||
|
||||
| Category | Instructions |
|
||||
|----------|-------------|
|
||||
| Pan | `[左移]`, `[右移]` |
|
||||
| Rotation | `[左摇]`, `[右摇]` |
|
||||
| Push/Pull | `[推进]`, `[拉远]` |
|
||||
| Elevation | `[上升]`, `[下降]` |
|
||||
| Tilt | `[上摇]`, `[下摇]` |
|
||||
| Zoom | `[变焦推近]`, `[变焦拉远]` |
|
||||
| Other | `[晃动]`, `[跟随]`, `[固定]` |
|
||||
|
||||
Combine for simultaneous: `[左摇,上升]` (max 3). Sequential: `...[推进], then ...[拉远]`
|
||||
|
||||
---
|
||||
|
||||
## Response
|
||||
|
||||
**Query status:** `Preparing`, `Queueing`, `Processing`, `Success`, `Fail`
|
||||
|
||||
**Error codes:** 0 (success), 1002 (rate limited), 1004 (auth failed), 1008 (insufficient balance), 1026 (sensitive content), 2013 (invalid params), 2049 (invalid API key)
|
||||
|
||||
---
|
||||
|
||||
## Video Templates
|
||||
|
||||
| Template | ID | Input | Description |
|
||||
|----------|-----|-------|-------------|
|
||||
| Diving | 392753057216684038 | Image | Diving motion |
|
||||
| Rings | 393881433990066176 | Image | Gymnastics rings |
|
||||
| Survival | 393769180141805569 | Image + Text | Outdoor survival |
|
||||
| Labubu | 394246956137422856 | Image | Labubu character |
|
||||
| McDonald's Delivery | 393879757702918151 | Image | Pet courier |
|
||||
| Tibetan Portrait | 393766210733957121 | Image | Cultural portrait |
|
||||
| Female Model Ads | 393866076583718914 | Image | Female fashion |
|
||||
| Male Model Ads | 393876118804459526 | Image | Male fashion |
|
||||
| Winter Romance | 393857704283172856 | Image | Snowy portrait |
|
||||
| Four Seasons | 398574688191234048 | Image | Seasonal portrait |
|
||||
| Helpless Moments | 394125185182695432 | Text only | Comedic animation |
|
||||
@@ -0,0 +1,98 @@
|
||||
# Video Prompt Writing Guide
|
||||
|
||||
## Prompt Structure
|
||||
|
||||
### Basic Formula
|
||||
**Main subject + Scene/Space + Movement/Change**
|
||||
|
||||
Examples:
|
||||
- "A puppy runs toward the camera in a sunny park"
|
||||
- "A woman walks in the rain holding an umbrella on a city street"
|
||||
- "A stream flows through a green valley with morning mist"
|
||||
|
||||
### Professional Formula
|
||||
**Main subject + Scene + Movement + Camera motion + Aesthetic atmosphere**
|
||||
|
||||
Examples:
|
||||
- "A couple sits on a park bench, warm golden hour lighting, [固定] framing, intimate and romantic atmosphere"
|
||||
- "A young man in a suit eats noodles at a street stall, [拉远] revealing the busy night market, warm tones, cinematic"
|
||||
- "A dancer performs contemporary dance in an empty studio, [跟随] smooth tracking, dramatic side lighting"
|
||||
|
||||
---
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **More precise language → more accurate video**
|
||||
2. **Richer description → better generation quality**
|
||||
3. **Keep prompts focused on 5-6 seconds of action** — do not describe too many events
|
||||
4. **Combine shot types with mood descriptors** for professional output
|
||||
|
||||
---
|
||||
|
||||
## Camera Instructions Usage
|
||||
|
||||
### Simultaneous Camera Movement
|
||||
Place multiple instructions in one bracket:
|
||||
- `[左摇,上升]` — pan left while rising
|
||||
- `[推进,下摇]` — push in while tilting down
|
||||
|
||||
### Sequential Camera Movement
|
||||
Place instructions at different points in the prompt:
|
||||
- "The camera starts with [推进] toward the face, then [拉远] to reveal the full scene"
|
||||
|
||||
---
|
||||
|
||||
## Style-Specific Prompt Tips
|
||||
|
||||
### Realistic / Cinematic Style
|
||||
- Mention lighting: "golden hour", "overcast sky", "dramatic side lighting"
|
||||
- Color grading: "warm tones", "cool desaturated palette", "high contrast"
|
||||
- Texture: "rain droplets on glass", "dust particles in sunlight"
|
||||
- Cinematic terms: "shallow depth of field", "anamorphic lens flare"
|
||||
|
||||
### Animation Style
|
||||
- Substyle: "2D anime", "3D Pixar-style", "watercolor animation", "stop-motion"
|
||||
- Character design: "big expressive eyes", "chibi proportions"
|
||||
- Effects: "sparkle particles", "speed lines", "dramatic wind effects"
|
||||
|
||||
### Product / Commercial Style
|
||||
- Product details: "smooth surface", "premium materials", "elegant design"
|
||||
- Studio lighting: "soft box lighting", "rim light", "gradient background"
|
||||
- Motion: "slow rotation", "smooth reveal", "gentle float"
|
||||
|
||||
### Fantasy / Sci-Fi Style
|
||||
- World elements: "floating islands", "neon cyberpunk city", "enchanted forest"
|
||||
- VFX: "magic particles", "holographic displays", "energy beams"
|
||||
- Scale: "vast landscape", "towering structures", "infinite horizon"
|
||||
|
||||
### Nature / Documentary Style
|
||||
- Terminology: "macro shot", "time-lapse", "wildlife behavior"
|
||||
- Phenomena: "morning dew", "sunset colors", "storm clouds"
|
||||
- Precision: "slow motion at 240fps", "underwater perspective"
|
||||
|
||||
---
|
||||
|
||||
## Image-to-Video Prompt Tips
|
||||
|
||||
Focus on **movement and change** since the image establishes the visual:
|
||||
- Image of still lake → "Gentle ripples spread across the water surface, a breeze rustles the trees, [固定] fixed camera, peaceful"
|
||||
- Image of portrait → "The person slowly smiles and turns their head, natural blinking, [推进] subtle push in, warm lighting"
|
||||
|
||||
---
|
||||
|
||||
## Prompt Building Checklist
|
||||
|
||||
1. **Subject**: Appearance, clothing, color, expression, posture
|
||||
2. **Action**: 1-2 key temporal actions ("first...then...")
|
||||
3. **Scene**: Setting with foreground + background + atmosphere
|
||||
4. **Camera**: `[运镜指令]` for precise control
|
||||
5. **Aesthetic**: Lighting, color, texture, cinematic quality
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
1. Too many events for 6-second videos
|
||||
2. Conflicting camera instructions
|
||||
3. Vague descriptions
|
||||
4. Static descriptions without motion
|
||||
5. Missing aesthetic layer
|
||||
6. Overlong prompts (keep under 200 words)
|
||||
156
skills/minimax-multimodal-toolkit/scripts/check_environment.sh
Executable file
156
skills/minimax-multimodal-toolkit/scripts/check_environment.sh
Executable file
@@ -0,0 +1,156 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Multi-Modal Toolkit — Environment Check
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/check_environment.sh
|
||||
# bash scripts/check_environment.sh --test-api
|
||||
set -euo pipefail
|
||||
|
||||
PASSED=0
|
||||
FAILED=0
|
||||
TOTAL=0
|
||||
|
||||
check() {
|
||||
TOTAL=$((TOTAL + 1))
|
||||
if "$@"; then
|
||||
PASSED=$((PASSED + 1))
|
||||
else
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
}
|
||||
|
||||
check_curl() {
|
||||
if command -v curl &>/dev/null; then
|
||||
echo "[OK] curl installed"
|
||||
return 0
|
||||
fi
|
||||
echo "[FAIL] curl not installed"
|
||||
return 1
|
||||
}
|
||||
|
||||
check_ffmpeg() {
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
echo "[OK] FFmpeg installed"
|
||||
return 0
|
||||
fi
|
||||
echo "[FAIL] FFmpeg not installed"
|
||||
return 1
|
||||
}
|
||||
|
||||
check_ffprobe() {
|
||||
if command -v ffprobe &>/dev/null; then
|
||||
echo "[OK] ffprobe installed"
|
||||
return 0
|
||||
fi
|
||||
echo "[FAIL] ffprobe not installed"
|
||||
return 1
|
||||
}
|
||||
|
||||
check_jq() {
|
||||
if command -v jq &>/dev/null; then
|
||||
echo "[OK] jq installed"
|
||||
return 0
|
||||
fi
|
||||
echo "[FAIL] jq not installed (brew install jq / apt install jq)"
|
||||
return 1
|
||||
}
|
||||
|
||||
check_xxd() {
|
||||
if command -v xxd &>/dev/null; then
|
||||
echo "[OK] xxd installed"
|
||||
return 0
|
||||
fi
|
||||
echo "[FAIL] xxd not installed"
|
||||
return 1
|
||||
}
|
||||
|
||||
check_api_host() {
|
||||
local api_host="${MINIMAX_API_HOST:-}"
|
||||
if [[ -z "$api_host" ]]; then
|
||||
echo "[FAIL] MINIMAX_API_HOST not set"
|
||||
echo " China Mainland: export MINIMAX_API_HOST='https://api.minimaxi.com'"
|
||||
echo " Global: export MINIMAX_API_HOST='https://api.minimax.io'"
|
||||
return 1
|
||||
fi
|
||||
if [[ "$api_host" != "https://api.minimaxi.com" && "$api_host" != "https://api.minimax.io" ]]; then
|
||||
echo "[WARN] MINIMAX_API_HOST has non-standard value: $api_host"
|
||||
echo " Expected: https://api.minimaxi.com (China) or https://api.minimax.io (Global)"
|
||||
return 0
|
||||
fi
|
||||
echo "[OK] MINIMAX_API_HOST set ($api_host)"
|
||||
return 0
|
||||
}
|
||||
|
||||
check_api_key() {
|
||||
local api_key="${MINIMAX_API_KEY:-}"
|
||||
if [[ -z "$api_key" ]]; then
|
||||
echo "[FAIL] MINIMAX_API_KEY not set"
|
||||
echo " export MINIMAX_API_KEY='your-key'"
|
||||
return 1
|
||||
fi
|
||||
if [[ "$api_key" != sk-api* && "$api_key" != sk-cp* ]]; then
|
||||
echo "[FAIL] Invalid API key format"
|
||||
echo " Expected: sk-api-xxx... or sk-cp-xxx..."
|
||||
echo " Got: ${api_key:0:20}..."
|
||||
return 1
|
||||
fi
|
||||
echo "[OK] MINIMAX_API_KEY set (${#api_key} chars)"
|
||||
return 0
|
||||
}
|
||||
|
||||
check_api_connectivity() {
|
||||
local api_host="${MINIMAX_API_HOST:-}"
|
||||
local api_key="${MINIMAX_API_KEY:-}"
|
||||
if [[ -z "$api_key" ]]; then
|
||||
echo "[FAIL] API connectivity skipped (MINIMAX_API_KEY not set)"
|
||||
return 1
|
||||
fi
|
||||
if [[ -z "$api_host" ]]; then
|
||||
echo "[FAIL] API connectivity skipped (MINIMAX_API_HOST not set)"
|
||||
return 1
|
||||
fi
|
||||
local http_code
|
||||
http_code=$(curl -s -o /dev/null -w "%{http_code}" \
|
||||
-H "Authorization: Bearer $api_key" \
|
||||
--max-time 10 \
|
||||
"$api_host" 2>/dev/null) || true
|
||||
if [[ -n "$http_code" && "$http_code" -lt 500 ]] 2>/dev/null; then
|
||||
echo "[OK] API host reachable (HTTP $http_code)"
|
||||
return 0
|
||||
fi
|
||||
echo "[FAIL] API host unreachable ($api_host)"
|
||||
return 1
|
||||
}
|
||||
|
||||
# --- Main ---
|
||||
TEST_API=false
|
||||
for arg in "$@"; do
|
||||
case "$arg" in
|
||||
--test-api) TEST_API=true ;;
|
||||
esac
|
||||
done
|
||||
|
||||
echo "MiniMax Multi-Modal Toolkit — Environment Check"
|
||||
echo "========================================"
|
||||
|
||||
check check_curl
|
||||
check check_ffmpeg
|
||||
check check_ffprobe
|
||||
check check_jq
|
||||
check check_xxd
|
||||
check check_api_host
|
||||
check check_api_key
|
||||
|
||||
if $TEST_API; then
|
||||
check check_api_connectivity
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "========================================"
|
||||
if [[ $FAILED -eq 0 ]]; then
|
||||
echo "All $TOTAL checks passed!"
|
||||
exit 0
|
||||
else
|
||||
echo "$FAILED check(s) failed out of $TOTAL"
|
||||
exit 1
|
||||
fi
|
||||
277
skills/minimax-multimodal-toolkit/scripts/image/generate_image.sh
Executable file
277
skills/minimax-multimodal-toolkit/scripts/image/generate_image.sh
Executable file
@@ -0,0 +1,277 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Image Generation CLI (pure bash)
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/image/generate_image.sh --prompt "A cat on a rooftop at sunset" -o minimax-output/cat.png
|
||||
# bash scripts/image/generate_image.sh --mode i2i --prompt "A girl reading in a library" --ref-image face.jpg -o minimax-output/girl.png
|
||||
# bash scripts/image/generate_image.sh --prompt "Mountain landscape" --aspect-ratio 16:9 -n 3 -o minimax-output/landscape.png
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
# ============================================================================
|
||||
# Common functions
|
||||
# ============================================================================
|
||||
|
||||
load_env() {
|
||||
local env_file
|
||||
for env_file in "$PROJECT_ROOT/.env" "$(pwd)/.env"; do
|
||||
if [[ -f "$env_file" ]]; then
|
||||
while IFS= read -r line || [[ -n "$line" ]]; do
|
||||
line="${line%%#*}"; line="$(echo "$line" | xargs)"
|
||||
[[ -z "$line" || "$line" != *=* ]] && continue
|
||||
local key="${line%%=*}" val="${line#*=}"
|
||||
key="$(echo "$key" | xargs)"; val="$(echo "$val" | xargs)"
|
||||
if [[ ${#val} -ge 2 ]]; then
|
||||
case "$val" in \"*\") val="${val:1:${#val}-2}" ;; \'*\') val="${val:1:${#val}-2}" ;; esac
|
||||
fi
|
||||
[[ -z "${!key:-}" ]] && export "$key=$val"
|
||||
done < "$env_file"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
check_api_key() {
|
||||
if [[ -z "${MINIMAX_API_KEY:-}" ]]; then
|
||||
echo "Error: MINIMAX_API_KEY environment variable is not set." >&2; exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
image_to_data_url() {
|
||||
local path="$1"
|
||||
[[ -f "$path" ]] || { echo "Error: Image not found: $path" >&2; exit 1; }
|
||||
local mime
|
||||
mime="$(file -b --mime-type "$path" 2>/dev/null)" || mime="image/jpeg"
|
||||
local b64
|
||||
b64="$(base64 < "$path")"
|
||||
echo "data:${mime};base64,${b64}"
|
||||
}
|
||||
|
||||
resolve_image() {
|
||||
local input="$1"
|
||||
[[ -z "$input" ]] && return
|
||||
case "$input" in
|
||||
http://*|https://*|data:*) echo "$input" ;;
|
||||
*) image_to_data_url "$input" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main
|
||||
# ============================================================================
|
||||
|
||||
main() {
|
||||
load_env
|
||||
check_api_key
|
||||
|
||||
local mode="t2i" prompt="" model="image-01"
|
||||
local aspect_ratio="" width="" height=""
|
||||
local response_format="url" n=1 seed=""
|
||||
local prompt_optimizer=false aigc_watermark=false
|
||||
local ref_image=""
|
||||
local output="" download=true
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--mode) mode="$2"; shift 2 ;;
|
||||
--prompt) prompt="$2"; shift 2 ;;
|
||||
--aspect-ratio|--ratio) aspect_ratio="$2"; shift 2 ;;
|
||||
--width) width="$2"; shift 2 ;;
|
||||
--height) height="$2"; shift 2 ;;
|
||||
--response-format) response_format="$2"; shift 2 ;;
|
||||
-n|--count) n="$2"; shift 2 ;;
|
||||
--seed) seed="$2"; shift 2 ;;
|
||||
--prompt-optimizer) prompt_optimizer=true; shift ;;
|
||||
--aigc-watermark) aigc_watermark=true; shift ;;
|
||||
--ref-image) ref_image="$2"; shift 2 ;;
|
||||
--no-download) download=false; shift ;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
cat <<'USAGE'
|
||||
MiniMax Image Generation CLI (model: image-01)
|
||||
|
||||
Usage:
|
||||
generate_image.sh [--mode MODE] [options] -o OUTPUT
|
||||
|
||||
Modes:
|
||||
t2i Text-to-image (default) — generate image from text prompt
|
||||
i2i Image-to-image — generate image using a character reference photo
|
||||
|
||||
Options:
|
||||
--mode MODE Generation mode: t2i (default), i2i
|
||||
--prompt TEXT Text description of the image (max 1500 chars, required)
|
||||
--aspect-ratio RATIO Aspect ratio: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16, 21:9
|
||||
--width PX Custom width in pixels (512-2048, multiple of 8)
|
||||
--height PX Custom height in pixels (512-2048, multiple of 8)
|
||||
-n, --count N Number of images to generate (1-9, default: 1)
|
||||
--seed N Random seed for reproducibility
|
||||
--prompt-optimizer Enable automatic prompt optimization
|
||||
--aigc-watermark Add AIGC watermark to generated images
|
||||
--ref-image FILE Character reference image (local file or URL, i2i mode)
|
||||
--response-format FMT Response format: url (default), base64
|
||||
--no-download Don't download, just print URL(s)
|
||||
-o, --output FILE Output file path (required)
|
||||
|
||||
Examples:
|
||||
# Text-to-image (default)
|
||||
generate_image.sh --prompt "A cat on a rooftop at sunset, cinematic" -o cat.png
|
||||
|
||||
# Custom aspect ratio
|
||||
generate_image.sh --prompt "Mountain landscape" --aspect-ratio 16:9 -o landscape.png
|
||||
|
||||
# Multiple images
|
||||
generate_image.sh --prompt "Abstract art" -n 3 -o art.png
|
||||
|
||||
# Image-to-image with character reference
|
||||
generate_image.sh --mode i2i --prompt "A girl reading in a library" --ref-image face.jpg -o girl.png
|
||||
USAGE
|
||||
exit 0
|
||||
;;
|
||||
*) echo "Unknown option: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$prompt" ]]; then
|
||||
echo "Error: --prompt is required" >&2; exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: --output / -o is required" >&2; exit 1
|
||||
fi
|
||||
|
||||
# Validate n range
|
||||
if [[ "$n" -lt 1 || "$n" -gt 9 ]] 2>/dev/null; then
|
||||
echo "Error: -n must be between 1 and 9" >&2; exit 1
|
||||
fi
|
||||
|
||||
# Build payload
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg model "$model" \
|
||||
--arg prompt "$prompt" \
|
||||
--arg rf "$response_format" \
|
||||
--argjson n "$n" \
|
||||
--argjson po "$prompt_optimizer" \
|
||||
--argjson aw "$aigc_watermark" \
|
||||
'{model: $model, prompt: $prompt, response_format: $rf, n: $n, prompt_optimizer: $po, aigc_watermark: $aw}')
|
||||
|
||||
[[ -n "$aspect_ratio" ]] && payload=$(echo "$payload" | jq --arg ar "$aspect_ratio" '. + {aspect_ratio: $ar}')
|
||||
[[ -n "$width" ]] && payload=$(echo "$payload" | jq --argjson w "$width" '. + {width: $w}')
|
||||
[[ -n "$height" ]] && payload=$(echo "$payload" | jq --argjson h "$height" '. + {height: $h}')
|
||||
[[ -n "$seed" ]] && payload=$(echo "$payload" | jq --argjson s "$seed" '. + {seed: $s}')
|
||||
|
||||
# Subject reference (i2i mode)
|
||||
if [[ "$mode" == "i2i" ]]; then
|
||||
if [[ -z "$ref_image" ]]; then
|
||||
echo "Error: --ref-image is required for i2i mode" >&2; exit 1
|
||||
fi
|
||||
local img_url
|
||||
img_url="$(resolve_image "$ref_image")"
|
||||
payload=$(echo "$payload" | jq --arg img "$img_url" '. + {subject_reference: [{type: "character", image_file: $img}]}')
|
||||
fi
|
||||
|
||||
local api_host="${MINIMAX_API_HOST:-https://api.minimaxi.com}"
|
||||
local api_url="${api_host}/v1/image_generation"
|
||||
|
||||
echo "Mode: $mode"
|
||||
echo "Model: $model"
|
||||
echo "Generating $n image(s)..."
|
||||
|
||||
local raw_output http_code response
|
||||
raw_output="$(curl -s -w "\n%{http_code}" \
|
||||
-X POST "$api_url" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time 120 \
|
||||
-d "$payload" 2>/dev/null)" || {
|
||||
echo "Error: curl request failed" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
http_code="${raw_output##*$'\n'}"
|
||||
response="${raw_output%$'\n'*}"
|
||||
|
||||
if [[ "$http_code" -ge 400 ]] 2>/dev/null; then
|
||||
echo "Error: API returned HTTP $http_code" >&2
|
||||
echo "$response" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local status_code
|
||||
status_code="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
if [[ "$status_code" != "0" && -n "$status_code" ]]; then
|
||||
local status_msg
|
||||
status_msg="$(echo "$response" | jq -r '.base_resp.status_msg // "Unknown error"')"
|
||||
echo "Error: API error (code $status_code): $status_msg" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local success_count failed_count
|
||||
success_count="$(echo "$response" | jq -r '.metadata.success_count // 0')" 2>/dev/null || true
|
||||
failed_count="$(echo "$response" | jq -r '.metadata.failed_count // 0')" 2>/dev/null || true
|
||||
echo "Success: $success_count, Failed: $failed_count"
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
if [[ "$response_format" == "base64" ]]; then
|
||||
local count
|
||||
count="$(echo "$response" | jq '.data.image_base64 | length')" 2>/dev/null || count=0
|
||||
if [[ "$count" -eq 0 ]]; then
|
||||
echo "Error: No image data in response" >&2; exit 1
|
||||
fi
|
||||
|
||||
if [[ "$count" -eq 1 ]]; then
|
||||
echo "$response" | jq -r '.data.image_base64[0]' | base64 -d > "$output"
|
||||
echo "Image saved to: $output"
|
||||
else
|
||||
local ext="${output##*.}"
|
||||
local base="${output%.*}"
|
||||
for ((i=0; i<count; i++)); do
|
||||
local out_file="${base}_$((i+1)).${ext}"
|
||||
echo "$response" | jq -r ".data.image_base64[$i]" | base64 -d > "$out_file"
|
||||
echo "Image saved to: $out_file"
|
||||
done
|
||||
fi
|
||||
|
||||
elif [[ "$response_format" == "url" ]]; then
|
||||
local count
|
||||
count="$(echo "$response" | jq '.data.image_urls | length')" 2>/dev/null || count=0
|
||||
if [[ "$count" -eq 0 ]]; then
|
||||
echo "Error: No image URLs in response" >&2
|
||||
echo "$response" | jq . >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if $download; then
|
||||
if [[ "$count" -eq 1 ]]; then
|
||||
local img_url
|
||||
img_url="$(echo "$response" | jq -r '.data.image_urls[0]')"
|
||||
echo "URL: $img_url"
|
||||
curl -s -o "$output" --max-time 120 "$img_url"
|
||||
echo "Image downloaded to: $output"
|
||||
else
|
||||
local ext="${output##*.}"
|
||||
local base="${output%.*}"
|
||||
for ((i=0; i<count; i++)); do
|
||||
local img_url out_file
|
||||
img_url="$(echo "$response" | jq -r ".data.image_urls[$i]")"
|
||||
out_file="${base}_$((i+1)).${ext}"
|
||||
echo "URL $((i+1)): $img_url"
|
||||
curl -s -o "$out_file" --max-time 120 "$img_url"
|
||||
echo "Image downloaded to: $out_file"
|
||||
done
|
||||
fi
|
||||
else
|
||||
for ((i=0; i<count; i++)); do
|
||||
local img_url
|
||||
img_url="$(echo "$response" | jq -r ".data.image_urls[$i]")"
|
||||
echo "Image URL $((i+1)): $img_url"
|
||||
done
|
||||
echo "Use without --no-download to save files automatically."
|
||||
fi
|
||||
fi
|
||||
|
||||
echo "Done!"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
543
skills/minimax-multimodal-toolkit/scripts/media_tools.sh
Executable file
543
skills/minimax-multimodal-toolkit/scripts/media_tools.sh
Executable file
@@ -0,0 +1,543 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Multi-Modal Toolkit Media Tools CLI (pure bash)
|
||||
#
|
||||
# FFmpeg-based utilities for audio/video format conversion, concatenation,
|
||||
# extraction, and trimming.
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/media_tools.sh convert-video input.webm -o output.mp4
|
||||
# bash scripts/media_tools.sh convert-audio input.wav -o output.mp3
|
||||
# bash scripts/media_tools.sh concat-video seg1.mp4 seg2.mp4 -o merged.mp4
|
||||
# bash scripts/media_tools.sh concat-audio part1.mp3 part2.mp3 -o combined.mp3
|
||||
# bash scripts/media_tools.sh extract-audio input.mp4 -o audio.mp3
|
||||
# bash scripts/media_tools.sh trim-video input.mp4 --start 5 --end 15 -o clip.mp4
|
||||
# bash scripts/media_tools.sh add-audio --video video.mp4 --audio bgm.mp3 -o output.mp4
|
||||
# bash scripts/media_tools.sh probe input.mp4
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================================
|
||||
# Probe / info helpers
|
||||
# ============================================================================
|
||||
|
||||
probe_media() {
|
||||
ffprobe -v error -show_format -show_streams -of json "$1" 2>/dev/null
|
||||
}
|
||||
|
||||
get_duration() {
|
||||
probe_media "$1" | jq -r '.format.duration // "0"'
|
||||
}
|
||||
|
||||
get_video_fps() {
|
||||
local fps_str
|
||||
fps_str="$(ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate -of csv=p=0 "$1" 2>/dev/null)" || { echo 25; return; }
|
||||
local num="${fps_str%/*}" den="${fps_str#*/}"
|
||||
echo $(( (num + den/2) / den )) 2>/dev/null || echo 25
|
||||
}
|
||||
|
||||
has_audio_stream() {
|
||||
local out
|
||||
out="$(ffprobe -v error -select_streams a -show_entries stream=codec_type -of csv=p=0 "$1" 2>/dev/null)"
|
||||
[[ "$out" == *audio* ]]
|
||||
}
|
||||
|
||||
has_video_stream() {
|
||||
local out
|
||||
out="$(ffprobe -v error -select_streams v -show_entries stream=codec_type -of csv=p=0 "$1" 2>/dev/null)"
|
||||
[[ "$out" == *video* ]]
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Video codec maps
|
||||
# ============================================================================
|
||||
|
||||
video_codec_for() {
|
||||
case "$1" in
|
||||
mp4|mov|mkv|avi|ts|flv) echo "libx264" ;;
|
||||
webm) echo "libvpx-vp9" ;;
|
||||
*) echo "libx264" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
audio_codec_for_container() {
|
||||
case "$1" in
|
||||
mp4|mov|mkv|ts|flv) echo "aac" ;;
|
||||
webm) echo "libopus" ;;
|
||||
avi) echo "mp3" ;;
|
||||
*) echo "aac" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
audio_codec_for_format() {
|
||||
case "$1" in
|
||||
mp3) echo "libmp3lame" ;;
|
||||
wav) echo "pcm_s16le" ;;
|
||||
flac) echo "flac" ;;
|
||||
ogg) echo "libvorbis" ;;
|
||||
aac|m4a) echo "aac" ;;
|
||||
opus) echo "libopus" ;;
|
||||
wma) echo "wmav2" ;;
|
||||
*) echo "libmp3lame" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
get_ext() {
|
||||
local name="$1"
|
||||
echo "${name##*.}" | tr '[:upper:]' '[:lower:]'
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: convert-video
|
||||
# ============================================================================
|
||||
cmd_convert_video() {
|
||||
local input="" output="" crf=18 preset="medium" resolution="" fps=""
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then input="$1"; shift; fi
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--crf) crf="$2"; shift 2 ;;
|
||||
--preset) preset="$2"; shift 2 ;;
|
||||
--resolution) resolution="$2"; shift 2 ;;
|
||||
--fps) fps="$2"; shift 2 ;;
|
||||
*) [[ -z "$input" ]] && input="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
[[ -z "$input" || ! -f "$input" ]] && { echo "Error: Input file not found: ${input:-<none>}" >&2; exit 1; }
|
||||
[[ -z "$output" ]] && { echo "Error: -o/--output required" >&2; exit 1; }
|
||||
|
||||
local ext; ext="$(get_ext "$output")"
|
||||
local v_codec; v_codec="$(video_codec_for "$ext")"
|
||||
local a_codec; a_codec="$(audio_codec_for_container "$ext")"
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
local cmd=(ffmpeg -y -i "$input")
|
||||
|
||||
# Video filters
|
||||
if [[ -n "$resolution" ]]; then
|
||||
local w="${resolution%%x*}" h="${resolution##*x}"
|
||||
cmd+=(-vf "scale=${w}:${h}")
|
||||
fi
|
||||
|
||||
cmd+=(-c:v "$v_codec")
|
||||
case "$v_codec" in
|
||||
libx264|libx265) cmd+=(-crf "$crf" -preset "$preset" -pix_fmt yuv420p) ;;
|
||||
libvpx-vp9) cmd+=(-crf "$crf" -b:v 0) ;;
|
||||
esac
|
||||
|
||||
[[ -n "$fps" ]] && cmd+=(-r "$fps")
|
||||
|
||||
if has_audio_stream "$input"; then
|
||||
cmd+=(-c:a "$a_codec" -b:a 192k)
|
||||
else
|
||||
cmd+=(-an)
|
||||
fi
|
||||
|
||||
cmd+=("$output")
|
||||
|
||||
echo "Converting: $input -> $output ($v_codec/$a_codec)"
|
||||
"${cmd[@]}" 2>/dev/null
|
||||
echo " Done: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: convert-audio
|
||||
# ============================================================================
|
||||
cmd_convert_audio() {
|
||||
local input="" output="" bitrate="192k" sample_rate="" channels=""
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then input="$1"; shift; fi
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--bitrate) bitrate="$2"; shift 2 ;;
|
||||
--sample-rate) sample_rate="$2"; shift 2 ;;
|
||||
--channels) channels="$2"; shift 2 ;;
|
||||
*) [[ -z "$input" ]] && input="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
[[ -z "$input" || ! -f "$input" ]] && { echo "Error: Input file not found: ${input:-<none>}" >&2; exit 1; }
|
||||
[[ -z "$output" ]] && { echo "Error: -o/--output required" >&2; exit 1; }
|
||||
|
||||
local ext; ext="$(get_ext "$output")"
|
||||
local codec; codec="$(audio_codec_for_format "$ext")"
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
local cmd=(ffmpeg -y -i "$input" -c:a "$codec" -b:a "$bitrate")
|
||||
[[ -n "$sample_rate" ]] && cmd+=(-ar "$sample_rate")
|
||||
[[ -n "$channels" ]] && cmd+=(-ac "$channels")
|
||||
cmd+=("$output")
|
||||
|
||||
echo "Converting audio: $input -> $output ($codec)"
|
||||
"${cmd[@]}" 2>/dev/null
|
||||
echo " Done: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: concat-video
|
||||
# ============================================================================
|
||||
cmd_concat_video() {
|
||||
local output="" crossfade=0.5
|
||||
local inputs=()
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--crossfade) crossfade="$2"; shift 2 ;;
|
||||
*) inputs+=("$1"); shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
[[ ${#inputs[@]} -lt 2 ]] && { echo "Error: At least 2 input files required" >&2; exit 1; }
|
||||
[[ -z "$output" ]] && { echo "Error: -o/--output required" >&2; exit 1; }
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
if [[ ${#inputs[@]} -eq 1 ]]; then
|
||||
cp "${inputs[0]}" "$output"
|
||||
return 0
|
||||
fi
|
||||
|
||||
local fps; fps="$(get_video_fps "${inputs[0]}")"
|
||||
local has_audio=true
|
||||
for vp in "${inputs[@]}"; do
|
||||
has_audio_stream "$vp" || { has_audio=false; break; }
|
||||
done
|
||||
|
||||
if [[ "$(echo "$crossfade > 0" | bc -l)" == "1" ]]; then
|
||||
local durations=()
|
||||
for vp in "${inputs[@]}"; do durations+=("$(get_duration "$vp")"); done
|
||||
|
||||
local ff_inputs=()
|
||||
for vp in "${inputs[@]}"; do ff_inputs+=(-i "$(cd "$(dirname "$vp")" && pwd)/$(basename "$vp")"); done
|
||||
|
||||
local n=${#inputs[@]}
|
||||
local offsets=() cumulative=0
|
||||
for ((i=0; i<n-1; i++)); do
|
||||
local offset; offset="$(echo "$cumulative + ${durations[$i]} - $crossfade" | bc -l)"
|
||||
offsets+=("$offset"); cumulative="$offset"
|
||||
done
|
||||
|
||||
local vf_parts=() af_parts=()
|
||||
if [[ $n -eq 2 ]]; then
|
||||
vf_parts+=("[0:v][1:v]xfade=transition=fade:duration=${crossfade}:offset=${offsets[0]}[vout]")
|
||||
$has_audio && af_parts+=("[0:a][1:a]acrossfade=d=${crossfade}:c1=tri:c2=tri[aout]")
|
||||
else
|
||||
vf_parts+=("[0:v][1:v]xfade=transition=fade:duration=${crossfade}:offset=${offsets[0]}[xv1]")
|
||||
$has_audio && af_parts+=("[0:a][1:a]acrossfade=d=${crossfade}:c1=tri:c2=tri[xa1]")
|
||||
for ((i=2; i<n; i++)); do
|
||||
local out_v="[xv${i}]" out_a="[xa${i}]"
|
||||
[[ $i -eq $((n-1)) ]] && { out_v="[vout]"; out_a="[aout]"; }
|
||||
vf_parts+=("[xv$((i-1))][${i}:v]xfade=transition=fade:duration=${crossfade}:offset=${offsets[$((i-1))]}${out_v}")
|
||||
$has_audio && af_parts+=("[xa$((i-1))][${i}:a]acrossfade=d=${crossfade}:c1=tri:c2=tri${out_a}")
|
||||
done
|
||||
fi
|
||||
|
||||
local fc
|
||||
fc="$(IFS=';'; echo "${vf_parts[*]}${af_parts[*]:+;${af_parts[*]}}")"
|
||||
|
||||
local cmd=(ffmpeg -y "${ff_inputs[@]}" -filter_complex "$fc" -map "[vout]")
|
||||
$has_audio && cmd+=(-map "[aout]")
|
||||
cmd+=(-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -r "$fps")
|
||||
$has_audio && cmd+=(-c:a aac -b:a 192k)
|
||||
cmd+=("$output")
|
||||
|
||||
echo "Concatenating $n videos with ${crossfade}s crossfade..."
|
||||
if "${cmd[@]}" 2>/dev/null; then
|
||||
echo " Done: $output"
|
||||
return 0
|
||||
fi
|
||||
echo " Crossfade failed, falling back to re-encode..."
|
||||
fi
|
||||
|
||||
# Fallback
|
||||
local concat_file; concat_file="$(mktemp /tmp/concat_XXXXXX.txt)"
|
||||
for vp in "${inputs[@]}"; do
|
||||
echo "file '$(cd "$(dirname "$vp")" && pwd)/$(basename "$vp")'" >> "$concat_file"
|
||||
done
|
||||
ffmpeg -y -f concat -safe 0 -i "$concat_file" \
|
||||
-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -r "$fps" \
|
||||
-c:a aac -b:a 192k "$output" 2>/dev/null
|
||||
rm -f "$concat_file"
|
||||
echo " Done: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: concat-audio
|
||||
# ============================================================================
|
||||
cmd_concat_audio() {
|
||||
local output="" crossfade=0
|
||||
local inputs=()
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--crossfade) crossfade="$2"; shift 2 ;;
|
||||
*) inputs+=("$1"); shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
[[ ${#inputs[@]} -lt 1 ]] && { echo "Error: At least 1 input file required" >&2; exit 1; }
|
||||
[[ -z "$output" ]] && { echo "Error: -o/--output required" >&2; exit 1; }
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
if [[ ${#inputs[@]} -eq 1 ]]; then
|
||||
cp "${inputs[0]}" "$output"
|
||||
echo " Done: $output"
|
||||
return 0
|
||||
fi
|
||||
|
||||
local ext; ext="$(get_ext "$output")"
|
||||
local codec; codec="$(audio_codec_for_format "$ext")"
|
||||
local n=${#inputs[@]}
|
||||
|
||||
if [[ "$(echo "$crossfade > 0" | bc -l)" == "1" ]]; then
|
||||
local ff_inputs=()
|
||||
for ap in "${inputs[@]}"; do ff_inputs+=(-i "$(cd "$(dirname "$ap")" && pwd)/$(basename "$ap")"); done
|
||||
|
||||
local af_parts=()
|
||||
if [[ $n -eq 2 ]]; then
|
||||
af_parts+=("[0:a][1:a]acrossfade=d=${crossfade}:c1=tri:c2=tri[aout]")
|
||||
else
|
||||
af_parts+=("[0:a][1:a]acrossfade=d=${crossfade}:c1=tri:c2=tri[xa1]")
|
||||
for ((i=2; i<n; i++)); do
|
||||
local prev="[xa$((i-1))]" out="[xa${i}]"
|
||||
[[ $i -eq $((n-1)) ]] && out="[aout]"
|
||||
af_parts+=("${prev}[${i}:a]acrossfade=d=${crossfade}:c1=tri:c2=tri${out}")
|
||||
done
|
||||
fi
|
||||
|
||||
local fc; fc="$(IFS=';'; echo "${af_parts[*]}")"
|
||||
|
||||
echo "Concatenating $n audio files with ${crossfade}s crossfade..."
|
||||
if ffmpeg -y "${ff_inputs[@]}" -filter_complex "$fc" -map "[aout]" \
|
||||
-c:a "$codec" -b:a 192k "$output" 2>/dev/null; then
|
||||
echo " Done: $output"
|
||||
return 0
|
||||
fi
|
||||
echo " Crossfade failed, falling back..."
|
||||
fi
|
||||
|
||||
# Fallback: concat demuxer
|
||||
local concat_file; concat_file="$(mktemp /tmp/concat_XXXXXX.txt)"
|
||||
for ap in "${inputs[@]}"; do
|
||||
echo "file '$(cd "$(dirname "$ap")" && pwd)/$(basename "$ap")'" >> "$concat_file"
|
||||
done
|
||||
ffmpeg -y -f concat -safe 0 -i "$concat_file" -c:a "$codec" -b:a 192k "$output" 2>/dev/null
|
||||
rm -f "$concat_file"
|
||||
echo " Done: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: extract-audio
|
||||
# ============================================================================
|
||||
cmd_extract_audio() {
|
||||
local input="" output="" bitrate="192k"
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then input="$1"; shift; fi
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--bitrate) bitrate="$2"; shift 2 ;;
|
||||
*) [[ -z "$input" ]] && input="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
[[ -z "$input" || ! -f "$input" ]] && { echo "Error: Input not found: ${input:-<none>}" >&2; exit 1; }
|
||||
[[ -z "$output" ]] && { echo "Error: -o/--output required" >&2; exit 1; }
|
||||
has_audio_stream "$input" || { echo "Error: No audio stream in $input" >&2; exit 1; }
|
||||
|
||||
local ext; ext="$(get_ext "$output")"
|
||||
local codec; codec="$(audio_codec_for_format "$ext")"
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
echo "Extracting audio: $input -> $output"
|
||||
ffmpeg -y -i "$input" -vn -c:a "$codec" -b:a "$bitrate" "$output" 2>/dev/null
|
||||
echo " Done: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: trim-video
|
||||
# ============================================================================
|
||||
cmd_trim_video() {
|
||||
local input="" output="" start="" end="" duration=""
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then input="$1"; shift; fi
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--start) start="$2"; shift 2 ;;
|
||||
--end) end="$2"; shift 2 ;;
|
||||
--duration) duration="$2"; shift 2 ;;
|
||||
*) [[ -z "$input" ]] && input="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
[[ -z "$input" || ! -f "$input" ]] && { echo "Error: Input not found: ${input:-<none>}" >&2; exit 1; }
|
||||
[[ -z "$output" ]] && { echo "Error: -o/--output required" >&2; exit 1; }
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
local cmd=(ffmpeg -y)
|
||||
[[ -n "$start" ]] && cmd+=(-ss "$start")
|
||||
cmd+=(-i "$input")
|
||||
|
||||
if [[ -n "$duration" ]]; then
|
||||
cmd+=(-t "$duration")
|
||||
elif [[ -n "$end" ]]; then
|
||||
local actual_start="${start:-0}"
|
||||
local dur; dur="$(echo "$end - $actual_start" | bc -l)"
|
||||
cmd+=(-t "$dur")
|
||||
fi
|
||||
|
||||
cmd+=(-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p)
|
||||
has_audio_stream "$input" && cmd+=(-c:a aac -b:a 192k)
|
||||
cmd+=("$output")
|
||||
|
||||
local start_str="${start:-0}s"
|
||||
local end_str="${end:+${end}s}"
|
||||
[[ -z "$end_str" && -n "$duration" ]] && end_str="+${duration}s"
|
||||
[[ -z "$end_str" ]] && end_str="end"
|
||||
echo "Trimming: $input [$start_str - $end_str] -> $output"
|
||||
"${cmd[@]}" 2>/dev/null
|
||||
echo " Done: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: add-audio
|
||||
# ============================================================================
|
||||
cmd_add_audio() {
|
||||
local video="" audio="" output="" volume=1.0 fade_in=0 fade_out=0 replace=false
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--video) video="$2"; shift 2 ;;
|
||||
--audio) audio="$2"; shift 2 ;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--volume) volume="$2"; shift 2 ;;
|
||||
--fade-in) fade_in="$2"; shift 2 ;;
|
||||
--fade-out) fade_out="$2"; shift 2 ;;
|
||||
--replace) replace=true; shift ;;
|
||||
*) echo "Unknown option: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
[[ -z "$video" || ! -f "$video" ]] && { echo "Error: Video not found: ${video:-<none>}" >&2; exit 1; }
|
||||
[[ -z "$audio" || ! -f "$audio" ]] && { echo "Error: Audio not found: ${audio:-<none>}" >&2; exit 1; }
|
||||
[[ -z "$output" ]] && { echo "Error: -o/--output required" >&2; exit 1; }
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
local duration; duration="$(get_duration "$video")"
|
||||
local video_audio=false
|
||||
has_audio_stream "$video" && video_audio=true
|
||||
|
||||
local af="[1:a]volume=${volume}"
|
||||
[[ "$(echo "$fade_in > 0" | bc -l)" == "1" ]] && af+=",afade=t=in:d=${fade_in}"
|
||||
if [[ "$(echo "$fade_out > 0" | bc -l)" == "1" ]]; then
|
||||
local fo_start; fo_start="$(echo "$duration - $fade_out" | bc -l)"
|
||||
[[ "$(echo "$fo_start < 0" | bc -l)" == "1" ]] && fo_start=0
|
||||
af+=",afade=t=out:st=${fo_start}:d=${fade_out}"
|
||||
fi
|
||||
|
||||
if $video_audio && ! $replace; then
|
||||
af+="[newaudio];[0:a][newaudio]amix=inputs=2:duration=first:dropout_transition=2[aout]"
|
||||
local mode="mixing with"
|
||||
else
|
||||
af+="[aout]"
|
||||
local mode="replacing"
|
||||
fi
|
||||
|
||||
echo "Adding audio ($mode original): $output"
|
||||
ffmpeg -y -i "$video" -i "$audio" \
|
||||
-filter_complex "$af" \
|
||||
-map 0:v -map "[aout]" \
|
||||
-c:v copy -c:a aac -b:a 192k -shortest "$output" 2>/dev/null
|
||||
echo " Done: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: probe
|
||||
# ============================================================================
|
||||
cmd_probe() {
|
||||
local input=""
|
||||
if [[ $# -gt 0 ]]; then input="$1"; fi
|
||||
|
||||
[[ -z "$input" || ! -f "$input" ]] && { echo "Error: File not found: ${input:-<none>}" >&2; exit 1; }
|
||||
|
||||
local info; info="$(probe_media "$input")"
|
||||
|
||||
local fmt_name dur size br
|
||||
fmt_name="$(echo "$info" | jq -r '.format.format_long_name // "unknown"')"
|
||||
dur="$(echo "$info" | jq -r '.format.duration // "0"')"
|
||||
size="$(echo "$info" | jq -r '.format.size // "0"')"
|
||||
br="$(echo "$info" | jq -r '.format.bit_rate // "0"')"
|
||||
|
||||
echo "File: $input"
|
||||
echo "Format: $fmt_name"
|
||||
printf "Duration: %.2fs\n" "$dur"
|
||||
printf "Size: %.2f MB\n" "$(echo "$size / 1048576" | bc -l)"
|
||||
printf "Bitrate: %.0f kbps\n" "$(echo "$br / 1000" | bc -l)"
|
||||
|
||||
echo "$info" | jq -r '.streams[] | if .codec_type == "video" then "Video: \(.codec_name) \(.width)x\(.height) @ \(.r_frame_rate) fps" elif .codec_type == "audio" then "Audio: \(.codec_name) \(.sample_rate)Hz \(.channels)ch" else empty end'
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main dispatcher
|
||||
# ============================================================================
|
||||
|
||||
usage() {
|
||||
cat <<'EOF'
|
||||
MiniMax Multi-Modal Toolkit Media Tools
|
||||
|
||||
Usage:
|
||||
media_tools.sh <command> [options]
|
||||
|
||||
Commands:
|
||||
convert-video Convert video format
|
||||
convert-audio Convert audio format
|
||||
concat-video Concatenate videos with crossfade
|
||||
concat-audio Concatenate audio files
|
||||
extract-audio Extract audio from video
|
||||
trim-video Trim video by time range
|
||||
add-audio Add/overlay audio on video
|
||||
probe Show media file info
|
||||
|
||||
Examples:
|
||||
media_tools.sh convert-video input.webm -o output.mp4
|
||||
media_tools.sh convert-audio input.wav -o output.mp3
|
||||
media_tools.sh concat-video seg1.mp4 seg2.mp4 -o merged.mp4
|
||||
media_tools.sh extract-audio video.mp4 -o audio.mp3
|
||||
media_tools.sh trim-video input.mp4 --start 5 --end 15 -o clip.mp4
|
||||
media_tools.sh add-audio --video video.mp4 --audio bgm.mp3 -o output.mp4
|
||||
media_tools.sh probe input.mp4
|
||||
EOF
|
||||
}
|
||||
|
||||
main() {
|
||||
if [[ $# -eq 0 ]]; then
|
||||
usage; exit 0
|
||||
fi
|
||||
|
||||
local command="$1"; shift
|
||||
|
||||
case "$command" in
|
||||
convert-video) cmd_convert_video "$@" ;;
|
||||
convert-audio) cmd_convert_audio "$@" ;;
|
||||
concat-video) cmd_concat_video "$@" ;;
|
||||
concat-audio) cmd_concat_audio "$@" ;;
|
||||
extract-audio) cmd_extract_audio "$@" ;;
|
||||
trim-video) cmd_trim_video "$@" ;;
|
||||
add-audio) cmd_add_audio "$@" ;;
|
||||
probe) cmd_probe "$@" ;;
|
||||
-h|--help|help) usage ;;
|
||||
*) echo "Unknown command: $command" >&2; usage >&2; exit 1 ;;
|
||||
esac
|
||||
}
|
||||
|
||||
main "$@"
|
||||
266
skills/minimax-multimodal-toolkit/scripts/music/generate_music.sh
Executable file
266
skills/minimax-multimodal-toolkit/scripts/music/generate_music.sh
Executable file
@@ -0,0 +1,266 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Music Generation CLI (pure bash)
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/music/generate_music.sh --lyrics "[verse]\nHello world" --output output/song.mp3 --download
|
||||
# bash scripts/music/generate_music.sh --instrumental --prompt "ambient electronic" -o output/ambient.mp3 --download
|
||||
# bash scripts/music/generate_music.sh --lyrics "[verse]\nStars" --genre pop --mood happy -o output/happy.mp3 --download
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
# ============================================================================
|
||||
# Common functions (shared with generate_voice.sh)
|
||||
# ============================================================================
|
||||
|
||||
load_env() {
|
||||
local env_file
|
||||
for env_file in "$PROJECT_ROOT/.env" "$(pwd)/.env"; do
|
||||
if [[ -f "$env_file" ]]; then
|
||||
while IFS= read -r line || [[ -n "$line" ]]; do
|
||||
line="${line%%#*}"
|
||||
line="$(echo "$line" | xargs)"
|
||||
[[ -z "$line" || "$line" != *=* ]] && continue
|
||||
local key="${line%%=*}" val="${line#*=}"
|
||||
key="$(echo "$key" | xargs)"; val="$(echo "$val" | xargs)"
|
||||
if [[ ${#val} -ge 2 ]]; then
|
||||
case "$val" in
|
||||
\"*\") val="${val:1:${#val}-2}" ;;
|
||||
\'*\') val="${val:1:${#val}-2}" ;;
|
||||
esac
|
||||
fi
|
||||
[[ -z "${!key:-}" ]] && export "$key=$val"
|
||||
done < "$env_file"
|
||||
return 0
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
check_api_key() {
|
||||
if [[ -z "${MINIMAX_API_KEY:-}" ]]; then
|
||||
echo "Error: MINIMAX_API_KEY environment variable is not set." >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main
|
||||
# ============================================================================
|
||||
|
||||
main() {
|
||||
load_env
|
||||
check_api_key
|
||||
|
||||
local lyrics="" prompt="" model="music-2.5" instrumental=false
|
||||
local genre="" mood="" tempo="" bpm="" key="" instruments="" vocals=""
|
||||
local use_case="" structure="" avoid="" references=""
|
||||
local output="" output_format="url" stream=false download=false
|
||||
local sample_rate="" bitrate="" format="" aigc_watermark=""
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--lyrics) lyrics="$2"; shift 2 ;;
|
||||
--prompt) prompt="$2"; shift 2 ;;
|
||||
--model) model="$2"; shift 2 ;;
|
||||
--instrumental) instrumental=true; shift ;;
|
||||
--genre) genre="$2"; shift 2 ;;
|
||||
--mood) mood="$2"; shift 2 ;;
|
||||
--tempo) tempo="$2"; shift 2 ;;
|
||||
--bpm) bpm="$2"; shift 2 ;;
|
||||
--key) key="$2"; shift 2 ;;
|
||||
--instruments) instruments="$2"; shift 2 ;;
|
||||
--vocals) vocals="$2"; shift 2 ;;
|
||||
--use-case) use_case="$2"; shift 2 ;;
|
||||
--structure) structure="$2"; shift 2 ;;
|
||||
--avoid) avoid="$2"; shift 2 ;;
|
||||
--references) references="$2"; shift 2 ;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--output-format) output_format="$2"; shift 2 ;;
|
||||
--stream) stream=true; shift ;;
|
||||
--download) download=true; shift ;;
|
||||
--sample-rate) sample_rate="$2"; shift 2 ;;
|
||||
--bitrate) bitrate="$2"; shift 2 ;;
|
||||
--format) format="$2"; shift 2 ;;
|
||||
--aigc-watermark) aigc_watermark="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
cat <<'USAGE'
|
||||
MiniMax Music Generation CLI
|
||||
|
||||
Usage:
|
||||
generate_music.sh [options]
|
||||
|
||||
Options:
|
||||
--lyrics TEXT Song lyrics (with [verse]/[chorus] tags)
|
||||
--prompt TEXT Music style/description prompt
|
||||
--instrumental Generate instrumental (no vocals)
|
||||
--model MODEL Model name (default: music-2.5)
|
||||
--genre TEXT Genre (e.g. pop, rock, jazz)
|
||||
--mood TEXT Mood (e.g. happy, melancholic)
|
||||
--tempo TEXT Tempo description (e.g. fast, slow)
|
||||
--bpm NUMBER Beats per minute
|
||||
--key TEXT Musical key (e.g. C major, A minor)
|
||||
--instruments TEXT Instruments to include
|
||||
--vocals TEXT Vocal style description
|
||||
--use-case TEXT Use case (e.g. background, theme song)
|
||||
--structure TEXT Song structure
|
||||
--avoid TEXT Elements to avoid
|
||||
--references TEXT Reference tracks/artists
|
||||
--output-format FMT Output format: url (default) or hex
|
||||
--download Download audio file (for url format)
|
||||
--sample-rate N Audio sample rate
|
||||
--bitrate N Audio bitrate
|
||||
--format FMT Audio format (mp3, wav, etc.)
|
||||
-o, --output FILE Output file path (required)
|
||||
|
||||
Examples:
|
||||
generate_music.sh --instrumental --prompt "ambient electronic" -o ambient.mp3 --download
|
||||
generate_music.sh --lyrics "[verse]\nHello world" -o song.mp3 --download
|
||||
generate_music.sh --lyrics "[verse]\nStars" --genre pop --mood happy -o happy.mp3 --download
|
||||
USAGE
|
||||
exit 0
|
||||
;;
|
||||
*) echo "Unknown option: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: --output / -o is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Build prompt from structured fields
|
||||
local field_parts=()
|
||||
[[ -n "$genre" ]] && field_parts+=("Genre: $genre")
|
||||
[[ -n "$mood" ]] && field_parts+=("Mood: $mood")
|
||||
[[ -n "$tempo" ]] && field_parts+=("Tempo: $tempo")
|
||||
[[ -n "$bpm" ]] && field_parts+=("BPM: $bpm")
|
||||
[[ -n "$key" ]] && field_parts+=("Key: $key")
|
||||
[[ -n "$instruments" ]] && field_parts+=("Instruments: $instruments")
|
||||
[[ -n "$vocals" ]] && field_parts+=("Vocals: $vocals")
|
||||
[[ -n "$use_case" ]] && field_parts+=("Use case: $use_case")
|
||||
[[ -n "$structure" ]] && field_parts+=("Structure: $structure")
|
||||
[[ -n "$avoid" ]] && field_parts+=("Avoid: $avoid")
|
||||
[[ -n "$references" ]] && field_parts+=("References: $references")
|
||||
|
||||
local field_prompt=""
|
||||
if [[ ${#field_parts[@]} -gt 0 ]]; then
|
||||
field_prompt="$(IFS='. '; echo "${field_parts[*]}")"
|
||||
fi
|
||||
|
||||
if [[ -n "$field_prompt" ]]; then
|
||||
if [[ -n "$prompt" ]]; then
|
||||
prompt="$prompt. $field_prompt"
|
||||
else
|
||||
prompt="$field_prompt"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Build payload
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg model "$model" \
|
||||
--arg prompt "$prompt" \
|
||||
--arg of "$output_format" \
|
||||
--argjson stream "$stream" \
|
||||
'{model: $model, prompt: $prompt, output_format: $of, stream: $stream}')
|
||||
|
||||
if $instrumental; then
|
||||
# music-2.5 does not support is_instrumental — use lyrics workaround
|
||||
payload=$(echo "$payload" | jq '. + {lyrics: "[intro] [outro]"}')
|
||||
local current_prompt
|
||||
current_prompt="$(echo "$payload" | jq -r '.prompt // ""')"
|
||||
if [[ -n "$current_prompt" ]]; then
|
||||
payload=$(echo "$payload" | jq --arg p "$current_prompt. pure music, no lyrics" '.prompt = $p')
|
||||
else
|
||||
payload=$(echo "$payload" | jq '.prompt = "pure music, no lyrics"')
|
||||
fi
|
||||
else
|
||||
payload=$(echo "$payload" | jq --arg l "$lyrics" '. + {lyrics: $l}')
|
||||
fi
|
||||
|
||||
# Audio settings
|
||||
local audio_setting="{}"
|
||||
[[ -n "$sample_rate" ]] && audio_setting=$(echo "$audio_setting" | jq --argjson sr "$sample_rate" '. + {sample_rate: $sr}')
|
||||
[[ -n "$bitrate" ]] && audio_setting=$(echo "$audio_setting" | jq --argjson br "$bitrate" '. + {bitrate: $br}')
|
||||
[[ -n "$format" ]] && audio_setting=$(echo "$audio_setting" | jq --arg f "$format" '. + {format: $f}')
|
||||
if [[ "$audio_setting" != "{}" ]]; then
|
||||
payload=$(echo "$payload" | jq --argjson as "$audio_setting" '. + {audio_setting: $as}')
|
||||
fi
|
||||
|
||||
[[ -n "$aigc_watermark" ]] && payload=$(echo "$payload" | jq --argjson aw "$aigc_watermark" '. + {aigc_watermark: $aw}')
|
||||
|
||||
local api_host="${MINIMAX_API_HOST:-https://api.minimaxi.com}"
|
||||
local api_url="${api_host}/v1/music_generation"
|
||||
|
||||
echo "Generating music with model: $model"
|
||||
echo "Output format: $output_format"
|
||||
|
||||
# Send request via curl
|
||||
local raw_output http_code response
|
||||
raw_output="$(curl -s -w "\n%{http_code}" \
|
||||
-X POST "$api_url" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time 300 \
|
||||
-d "$payload" 2>/dev/null)" || {
|
||||
echo "Error: curl request failed" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
http_code="${raw_output##*$'\n'}"
|
||||
response="${raw_output%$'\n'*}"
|
||||
|
||||
if [[ "$http_code" -ge 400 ]] 2>/dev/null; then
|
||||
echo "Error: API returned HTTP $http_code" >&2
|
||||
echo "$response" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local status_code
|
||||
status_code="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
if [[ "$status_code" != "0" && -n "$status_code" ]]; then
|
||||
echo "API error: $(echo "$response" | jq '.base_resp')" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
if [[ "$output_format" == "hex" ]]; then
|
||||
local audio_hex
|
||||
audio_hex="$(echo "$response" | jq -r '.data.audio // empty')"
|
||||
if [[ -z "$audio_hex" ]]; then
|
||||
echo "Error: No audio hex data in response." >&2
|
||||
exit 1
|
||||
fi
|
||||
echo "$audio_hex" | xxd -r -p > "$output"
|
||||
echo "Audio saved to: $output"
|
||||
|
||||
elif [[ "$output_format" == "url" ]]; then
|
||||
local audio_url
|
||||
audio_url="$(echo "$response" | jq -r '.data.audio_url // .data.audio // .data.audio_file.download_url // empty')"
|
||||
if [[ -z "$audio_url" ]]; then
|
||||
echo "Error: No audio URL in response." >&2
|
||||
echo "$response" | jq . >&2
|
||||
exit 1
|
||||
fi
|
||||
echo "Audio URL: $audio_url"
|
||||
if $download; then
|
||||
curl -s -o "$output" --max-time 120 "$audio_url"
|
||||
echo "Audio downloaded to: $output"
|
||||
else
|
||||
echo "Use --download to save the file, or download manually from the URL above."
|
||||
echo "$audio_url" > "$output"
|
||||
echo "URL written to: $output"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Print extra info if present
|
||||
local extra
|
||||
extra="$(echo "$response" | jq -r '.extra_info // .data.extra_info // empty')" 2>/dev/null || true
|
||||
if [[ -n "$extra" && "$extra" != "null" ]]; then
|
||||
echo "Extra info: $extra"
|
||||
fi
|
||||
}
|
||||
|
||||
main "$@"
|
||||
934
skills/minimax-multimodal-toolkit/scripts/tts/generate_voice.sh
Executable file
934
skills/minimax-multimodal-toolkit/scripts/tts/generate_voice.sh
Executable file
@@ -0,0 +1,934 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Voice CLI — Unified TTS command-line interface (pure bash)
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/tts/generate_voice.sh tts "Hello world" -o hello.mp3
|
||||
# bash scripts/tts/generate_voice.sh clone my_voice.mp3 --voice-id my-custom-voice
|
||||
# bash scripts/tts/generate_voice.sh design "A gentle female voice" --voice-id designed-voice-1
|
||||
# bash scripts/tts/generate_voice.sh list-voices
|
||||
# bash scripts/tts/generate_voice.sh validate segments.json
|
||||
# bash scripts/tts/generate_voice.sh generate segments.json -o output.mp3
|
||||
# bash scripts/tts/generate_voice.sh merge file1.mp3 file2.mp3 -o combined.mp3
|
||||
# bash scripts/tts/generate_voice.sh convert input.wav -o output.mp3
|
||||
# bash scripts/tts/generate_voice.sh check-env
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================================
|
||||
# Configuration
|
||||
# ============================================================================
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
# ============================================================================
|
||||
# Common functions
|
||||
# ============================================================================
|
||||
|
||||
load_env() {
|
||||
local env_file
|
||||
for env_file in "$PROJECT_ROOT/.env" "$(pwd)/.env"; do
|
||||
if [[ -f "$env_file" ]]; then
|
||||
while IFS= read -r line || [[ -n "$line" ]]; do
|
||||
line="${line%%#*}" # strip comments
|
||||
line="$(echo "$line" | xargs)" # trim whitespace
|
||||
[[ -z "$line" || "$line" != *=* ]] && continue
|
||||
local key="${line%%=*}"
|
||||
local val="${line#*=}"
|
||||
key="$(echo "$key" | xargs)"
|
||||
val="$(echo "$val" | xargs)"
|
||||
# Remove surrounding quotes
|
||||
if [[ ${#val} -ge 2 ]]; then
|
||||
case "$val" in
|
||||
\"*\") val="${val:1:${#val}-2}" ;;
|
||||
\'*\') val="${val:1:${#val}-2}" ;;
|
||||
esac
|
||||
fi
|
||||
# Only set if not already in environment
|
||||
if [[ -z "${!key:-}" ]]; then
|
||||
export "$key=$val"
|
||||
fi
|
||||
done < "$env_file"
|
||||
return 0
|
||||
fi
|
||||
done
|
||||
return 0
|
||||
}
|
||||
|
||||
check_api_key() {
|
||||
if [[ -z "${MINIMAX_API_KEY:-}" ]]; then
|
||||
echo "Error: MINIMAX_API_KEY environment variable is not set" >&2
|
||||
echo " export MINIMAX_API_KEY='your-key'" >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
ensure_dir() {
|
||||
local dir="$1"
|
||||
[[ -n "$dir" ]] && mkdir -p "$dir"
|
||||
}
|
||||
|
||||
API_BASE="${MINIMAX_API_HOST:-https://api.minimaxi.com}/v1"
|
||||
|
||||
api_request() {
|
||||
# api_request METHOD ENDPOINT [JSON_BODY]
|
||||
# Outputs raw JSON response to stdout.
|
||||
local method="$1" endpoint="$2" body="${3:-}"
|
||||
local url="${API_BASE}/${endpoint#/}"
|
||||
|
||||
local args=(
|
||||
-s -w "\n%{http_code}"
|
||||
-X "$method"
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}"
|
||||
-H "Accept-Encoding: gzip, deflate"
|
||||
--compressed
|
||||
--max-time 120
|
||||
)
|
||||
if [[ -n "$body" ]]; then
|
||||
args+=(-H "Content-Type: application/json" -d "$body")
|
||||
fi
|
||||
args+=("$url")
|
||||
|
||||
local output http_code response
|
||||
output="$(curl "${args[@]}" 2>/dev/null)" || {
|
||||
echo "Error: curl request failed" >&2
|
||||
exit 1
|
||||
}
|
||||
http_code="${output##*$'\n'}"
|
||||
response="${output%$'\n'*}"
|
||||
|
||||
if [[ "$http_code" -ge 400 ]] 2>/dev/null; then
|
||||
echo "Error: API returned HTTP $http_code" >&2
|
||||
echo "$response" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check API-level error
|
||||
local status_code
|
||||
status_code="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
if [[ "$status_code" != "0" && -n "$status_code" ]]; then
|
||||
local status_msg
|
||||
status_msg="$(echo "$response" | jq -r '.base_resp.status_msg // "Unknown error"')"
|
||||
echo "Error: API error [$status_code]: $status_msg" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "$response"
|
||||
}
|
||||
|
||||
api_upload() {
|
||||
# api_upload ENDPOINT FILE_PATH PURPOSE
|
||||
local endpoint="$1" file_path="$2" purpose="$3"
|
||||
local url="${API_BASE}/${endpoint#/}"
|
||||
|
||||
local output http_code response
|
||||
output="$(curl -s -w "\n%{http_code}" \
|
||||
-X POST \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Accept-Encoding: gzip, deflate" \
|
||||
--compressed \
|
||||
-F "file=@${file_path}" \
|
||||
-F "purpose=${purpose}" \
|
||||
--max-time 120 \
|
||||
"$url" 2>/dev/null)" || {
|
||||
echo "Error: curl upload failed" >&2
|
||||
exit 1
|
||||
}
|
||||
http_code="${output##*$'\n'}"
|
||||
response="${output%$'\n'*}"
|
||||
|
||||
if [[ "$http_code" -ge 400 ]] 2>/dev/null; then
|
||||
echo "Error: API returned HTTP $http_code" >&2
|
||||
echo "$response" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local status_code
|
||||
status_code="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
if [[ "$status_code" != "0" && -n "$status_code" ]]; then
|
||||
local status_msg
|
||||
status_msg="$(echo "$response" | jq -r '.base_resp.status_msg // "Unknown error"')"
|
||||
echo "Error: API error [$status_code]: $status_msg" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "$response"
|
||||
}
|
||||
|
||||
hex_to_file() {
|
||||
# hex_to_file HEX_STRING OUTPUT_PATH
|
||||
local hex="$1" output="$2"
|
||||
ensure_dir "$(dirname "$output")"
|
||||
echo "$hex" | xxd -r -p > "$output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: tts
|
||||
# ============================================================================
|
||||
cmd_tts() {
|
||||
local text="" voice_id="male-qn-qingse" output="" model="speech-2.8-hd"
|
||||
local speed=1.0 volume=1.0 pitch=0 emotion="" audio_format="mp3"
|
||||
local sample_rate=32000 language_boost=""
|
||||
|
||||
# First positional arg is text
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then
|
||||
text="$1"; shift
|
||||
fi
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-v|--voice-id) voice_id="$2"; shift 2 ;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--model) model="$2"; shift 2 ;;
|
||||
--speed) speed="$2"; shift 2 ;;
|
||||
--volume) volume="$2"; shift 2 ;;
|
||||
--pitch) pitch="$2"; shift 2 ;;
|
||||
--emotion) emotion="$2"; shift 2 ;;
|
||||
--format) audio_format="$2"; shift 2 ;;
|
||||
--sample-rate) sample_rate="$2"; shift 2 ;;
|
||||
--language-boost) language_boost="$2"; shift 2 ;;
|
||||
*) text="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$text" ]]; then
|
||||
echo "Error: text is required" >&2
|
||||
echo "Usage: $(basename "$0") tts \"Text to speak\" -o output.mp3" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Build voice_setting
|
||||
local voice_setting
|
||||
voice_setting=$(jq -n \
|
||||
--arg vid "$voice_id" \
|
||||
--argjson spd "$speed" \
|
||||
--argjson vol "$volume" \
|
||||
--argjson pit "$pitch" \
|
||||
'{voice_id: $vid, speed: $spd, vol: $vol, pitch: $pit}')
|
||||
|
||||
if [[ -n "$emotion" ]]; then
|
||||
voice_setting=$(echo "$voice_setting" | jq --arg e "$emotion" '. + {emotion: $e}')
|
||||
fi
|
||||
|
||||
# Build payload
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg model "$model" \
|
||||
--arg text "$text" \
|
||||
--argjson vs "$voice_setting" \
|
||||
--arg fmt "$audio_format" \
|
||||
--argjson sr "$sample_rate" \
|
||||
'{
|
||||
model: $model,
|
||||
text: $text,
|
||||
voice_setting: $vs,
|
||||
audio_setting: {sample_rate: $sr, bitrate: 128000, format: $fmt, channel: 1},
|
||||
stream: false,
|
||||
subtitle_enable: false,
|
||||
output_format: "hex"
|
||||
}')
|
||||
|
||||
if [[ -n "$language_boost" ]]; then
|
||||
payload=$(echo "$payload" | jq --arg lb "$language_boost" '. + {language_boost: $lb}')
|
||||
fi
|
||||
|
||||
echo "Synthesizing: ${text:0:50}..."
|
||||
local response
|
||||
response="$(api_request POST t2a_v2 "$payload")"
|
||||
|
||||
# Extract hex audio
|
||||
local audio_hex
|
||||
audio_hex="$(echo "$response" | jq -r '.data.audio // .extra_info.audio // empty')"
|
||||
|
||||
if [[ -z "$audio_hex" ]]; then
|
||||
echo "Error: No audio data returned from API" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -n "$output" ]]; then
|
||||
hex_to_file "$audio_hex" "$output"
|
||||
echo "Done: $output"
|
||||
else
|
||||
echo "Generated ${#audio_hex} hex chars of audio"
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: clone
|
||||
# ============================================================================
|
||||
cmd_clone() {
|
||||
local audio_file="" voice_id="" preview_text="" preview_output=""
|
||||
|
||||
# First positional arg is audio file
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then
|
||||
audio_file="$1"; shift
|
||||
fi
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--voice-id) voice_id="$2"; shift 2 ;;
|
||||
--preview) preview_text="$2"; shift 2 ;;
|
||||
--preview-output) preview_output="$2"; shift 2 ;;
|
||||
*) [[ -z "$audio_file" ]] && audio_file="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$audio_file" ]]; then
|
||||
echo "Error: audio file is required" >&2
|
||||
echo "Usage: $(basename "$0") clone audio.mp3 --voice-id my-voice" >&2
|
||||
exit 1
|
||||
fi
|
||||
if [[ ! -f "$audio_file" ]]; then
|
||||
echo "Error: Audio file not found: $audio_file" >&2
|
||||
exit 1
|
||||
fi
|
||||
if [[ -z "$voice_id" ]]; then
|
||||
echo "Error: --voice-id is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Cloning voice from: $audio_file"
|
||||
echo "Voice ID: $voice_id"
|
||||
|
||||
# Step 1: Upload audio
|
||||
local upload_response file_id
|
||||
upload_response="$(api_upload files/upload "$audio_file" voice_clone)"
|
||||
file_id="$(echo "$upload_response" | jq -r '.file.file_id // .file_id // empty')"
|
||||
|
||||
if [[ -z "$file_id" ]]; then
|
||||
echo "Error: Upload succeeded but no file_id was returned" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Step 2: Clone voice
|
||||
local clone_payload
|
||||
clone_payload=$(jq -n \
|
||||
--arg vid "$voice_id" \
|
||||
--argjson fid "$file_id" \
|
||||
'{voice_id: $vid, file_id: $fid}')
|
||||
|
||||
api_request POST voice_clone "$clone_payload" > /dev/null
|
||||
echo "Voice cloned successfully: $voice_id"
|
||||
|
||||
# Step 3: Preview if requested
|
||||
if [[ -n "$preview_text" ]]; then
|
||||
echo "Generating preview..."
|
||||
local pout="${preview_output:-${voice_id}_preview.mp3}"
|
||||
cmd_tts "$preview_text" -v "$voice_id" -o "$pout"
|
||||
echo "Preview saved to: $pout"
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: design
|
||||
# ============================================================================
|
||||
cmd_design() {
|
||||
local description="" voice_id="" preview_text="" preview_output=""
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then
|
||||
description="$1"; shift
|
||||
fi
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--voice-id) voice_id="$2"; shift 2 ;;
|
||||
--preview) preview_text="$2"; shift 2 ;;
|
||||
--preview-output) preview_output="$2"; shift 2 ;;
|
||||
*) [[ -z "$description" ]] && description="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$description" ]]; then
|
||||
echo "Error: description is required" >&2
|
||||
echo "Usage: $(basename \"$0\") design \"A warm female voice\" --voice-id narrator" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local ptext="${preview_text:-This is a preview of the designed voice.}"
|
||||
|
||||
echo "Designing voice from: \"$description\""
|
||||
[[ -n "$voice_id" ]] && echo "Voice ID: $voice_id"
|
||||
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg prompt "$description" \
|
||||
--arg pt "$ptext" \
|
||||
'{prompt: $prompt, preview_text: $pt}')
|
||||
|
||||
if [[ -n "$voice_id" ]]; then
|
||||
payload=$(echo "$payload" | jq --arg vid "$voice_id" '. + {voice_id: $vid}')
|
||||
fi
|
||||
|
||||
local response
|
||||
response="$(api_request POST voice_design "$payload")"
|
||||
|
||||
local actual_voice_id
|
||||
actual_voice_id="${voice_id:-$(echo "$response" | jq -r '.voice_id // "unknown"')}"
|
||||
echo "Voice designed: $actual_voice_id"
|
||||
|
||||
local trial_audio
|
||||
trial_audio="$(echo "$response" | jq -r '.trial_audio // empty')"
|
||||
if [[ -n "$trial_audio" ]]; then
|
||||
local pout="${preview_output:-${actual_voice_id}_preview.mp3}"
|
||||
hex_to_file "$trial_audio" "$pout"
|
||||
echo "Preview saved to: $pout"
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: list-voices
|
||||
# ============================================================================
|
||||
cmd_list_voices() {
|
||||
echo "=== System Voices ==="
|
||||
local sys_response
|
||||
sys_response="$(api_request POST voice/list '{"voice_type":"system"}' 2>/dev/null)" || true
|
||||
|
||||
if [[ -n "$sys_response" ]]; then
|
||||
local count
|
||||
count="$(echo "$sys_response" | jq '.voice_list | length')" 2>/dev/null || count=0
|
||||
if [[ "$count" -gt 0 ]]; then
|
||||
echo "$sys_response" | jq -r '.voice_list[:10][] | " \(.voice_id): \(.name // "N/A")"'
|
||||
if [[ "$count" -gt 10 ]]; then
|
||||
echo " ... and $((count - 10)) more"
|
||||
fi
|
||||
else
|
||||
echo " (None found)"
|
||||
fi
|
||||
else
|
||||
echo " (Could not fetch system voices)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== Custom Voices ==="
|
||||
|
||||
local clone_response design_response
|
||||
clone_response="$(api_request POST voice/list '{"voice_type":"voice_cloning"}' 2>/dev/null)" || true
|
||||
design_response="$(api_request POST voice/list '{"voice_type":"voice_generation"}' 2>/dev/null)" || true
|
||||
|
||||
local has_custom=false
|
||||
|
||||
if [[ -n "$clone_response" ]]; then
|
||||
local cc
|
||||
cc="$(echo "$clone_response" | jq '.voice_list | length')" 2>/dev/null || cc=0
|
||||
if [[ "$cc" -gt 0 ]]; then
|
||||
has_custom=true
|
||||
echo "Cloned ($cc):"
|
||||
echo "$clone_response" | jq -r '.voice_list[] | " \(.voice_id)"'
|
||||
fi
|
||||
fi
|
||||
|
||||
if [[ -n "$design_response" ]]; then
|
||||
local dc
|
||||
dc="$(echo "$design_response" | jq '.voice_list | length')" 2>/dev/null || dc=0
|
||||
if [[ "$dc" -gt 0 ]]; then
|
||||
has_custom=true
|
||||
echo "Designed ($dc):"
|
||||
echo "$design_response" | jq -r '.voice_list[] | " \(.voice_id)"'
|
||||
fi
|
||||
fi
|
||||
|
||||
if ! $has_custom; then
|
||||
echo " (None found)"
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: validate
|
||||
# ============================================================================
|
||||
cmd_validate() {
|
||||
local segments_file="" model="speech-2.8-hd" strict=false verbose=false
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then
|
||||
segments_file="$1"; shift
|
||||
fi
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--model) model="$2"; shift 2 ;;
|
||||
--strict) strict=true; shift ;;
|
||||
-v|--verbose) verbose=true; shift ;;
|
||||
--validate-voices) shift ;; # Not implemented in bash version
|
||||
*) [[ -z "$segments_file" ]] && segments_file="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$segments_file" || ! -f "$segments_file" ]]; then
|
||||
echo "Error: Segments file not found: ${segments_file:-<none>}" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Validating: $segments_file"
|
||||
echo "Model: $model"
|
||||
|
||||
local valid_emotions="happy sad angry fearful disgusted surprised calm fluent whisper"
|
||||
echo "Valid emotions: $valid_emotions"
|
||||
echo ""
|
||||
|
||||
# Parse JSON
|
||||
local segments count
|
||||
segments="$(jq -r 'if type == "array" then . elif type == "object" and has("segments") then .segments else empty end' "$segments_file" 2>/dev/null)" || {
|
||||
echo "Error: Invalid JSON in $segments_file" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
if [[ -z "$segments" || "$segments" == "null" ]]; then
|
||||
echo "Error: No segments found in file" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
count="$(echo "$segments" | jq 'length')"
|
||||
local errors=0
|
||||
|
||||
for ((i=0; i<count; i++)); do
|
||||
local text voice_id emotion
|
||||
text="$(echo "$segments" | jq -r ".[$i].text // \"\"")"
|
||||
voice_id="$(echo "$segments" | jq -r ".[$i].voice_id // \"\"")"
|
||||
emotion="$(echo "$segments" | jq -r ".[$i].emotion // \"\"")"
|
||||
|
||||
if [[ -z "$text" ]]; then
|
||||
echo " - Segment $i: 'text' is required and must not be empty"
|
||||
errors=$((errors + 1))
|
||||
fi
|
||||
if [[ -z "$voice_id" ]]; then
|
||||
echo " - Segment $i: 'voice_id' is required"
|
||||
errors=$((errors + 1))
|
||||
fi
|
||||
if [[ -n "$emotion" ]]; then
|
||||
if ! echo "$valid_emotions" | grep -qw "$emotion"; then
|
||||
echo " - Segment $i: invalid emotion '$emotion'"
|
||||
errors=$((errors + 1))
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ $errors -eq 0 ]]; then
|
||||
echo "Validation passed: $count segments"
|
||||
if $verbose; then
|
||||
echo ""
|
||||
echo "=== Segment Summary ==="
|
||||
for ((i=0; i<count; i++)); do
|
||||
local text voice_id emotion
|
||||
text="$(echo "$segments" | jq -r ".[$i].text // \"\"")"
|
||||
voice_id="$(echo "$segments" | jq -r ".[$i].voice_id // \"\"")"
|
||||
emotion="$(echo "$segments" | jq -r ".[$i].emotion // \"\"")"
|
||||
local elabel="${emotion:-AUTO}"
|
||||
printf " %d: [%-10s] voice=%-20s \"%s\"\n" "$i" "${elabel^^}" "${voice_id:0:20}" "${text:0:40}"
|
||||
done
|
||||
fi
|
||||
return 0
|
||||
else
|
||||
echo "Validation failed ($errors errors)"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: generate (multi-segment pipeline)
|
||||
# ============================================================================
|
||||
cmd_generate() {
|
||||
local segments_file="" output="" model="speech-2.8-hd" crossfade=200
|
||||
local no_normalize=false temp_dir="" continue_on_error=false
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then
|
||||
segments_file="$1"; shift
|
||||
fi
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--model) model="$2"; shift 2 ;;
|
||||
--crossfade) crossfade="$2"; shift 2 ;;
|
||||
--no-normalize) no_normalize=true; shift ;;
|
||||
--temp-dir) temp_dir="$2"; shift 2 ;;
|
||||
--continue-on-error) continue_on_error=true; shift ;;
|
||||
*) [[ -z "$segments_file" ]] && segments_file="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$segments_file" || ! -f "$segments_file" ]]; then
|
||||
echo "Error: Segments file not found: ${segments_file:-<none>}" >&2
|
||||
exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: -o/--output is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate first
|
||||
echo "Validating segments file..."
|
||||
local segments count
|
||||
segments="$(jq -r 'if type == "array" then . elif type == "object" and has("segments") then .segments else empty end' "$segments_file")"
|
||||
count="$(echo "$segments" | jq 'length')"
|
||||
|
||||
if [[ "$count" -eq 0 ]]; then
|
||||
echo "Error: No segments found" >&2
|
||||
exit 1
|
||||
fi
|
||||
echo "Found $count valid segments"
|
||||
echo ""
|
||||
|
||||
# Setup temp dir
|
||||
if [[ -z "$temp_dir" ]]; then
|
||||
temp_dir="$(dirname "$(cd "$(dirname "$output")" 2>/dev/null && pwd || echo ".")/$(basename "$output")")/tmp"
|
||||
fi
|
||||
mkdir -p "$temp_dir"
|
||||
echo "Temp directory: $temp_dir"
|
||||
|
||||
# Generate each segment
|
||||
local succeeded=0 failed=0
|
||||
local segment_files=()
|
||||
|
||||
for ((i=0; i<count; i++)); do
|
||||
local text voice_id emotion speed vol pitch
|
||||
text="$(echo "$segments" | jq -r ".[$i].text")"
|
||||
voice_id="$(echo "$segments" | jq -r ".[$i].voice_id")"
|
||||
emotion="$(echo "$segments" | jq -r ".[$i].emotion // \"\"")"
|
||||
speed="$(echo "$segments" | jq -r ".[$i].speed // 1.0")"
|
||||
vol="$(echo "$segments" | jq -r ".[$i].volume // 1.0")"
|
||||
pitch="$(echo "$segments" | jq -r ".[$i].pitch // 0")"
|
||||
|
||||
printf " Generating segment %d/%d: %s...\n" "$((i+1))" "$count" "${text:0:40}"
|
||||
|
||||
local seg_output="$temp_dir/segment_$(printf '%04d' "$i").mp3"
|
||||
|
||||
# Build voice_setting
|
||||
local voice_setting
|
||||
voice_setting=$(jq -n \
|
||||
--arg vid "$voice_id" \
|
||||
--argjson spd "$speed" \
|
||||
--argjson vol "$vol" \
|
||||
--argjson pit "$pitch" \
|
||||
'{voice_id: $vid, speed: $spd, vol: $vol, pitch: $pit}')
|
||||
if [[ -n "$emotion" ]]; then
|
||||
voice_setting=$(echo "$voice_setting" | jq --arg e "$emotion" '. + {emotion: $e}')
|
||||
fi
|
||||
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg model "$model" \
|
||||
--arg text "$text" \
|
||||
--argjson vs "$voice_setting" \
|
||||
'{
|
||||
model: $model,
|
||||
text: $text,
|
||||
voice_setting: $vs,
|
||||
audio_setting: {sample_rate: 32000, bitrate: 128000, format: "mp3", channel: 1},
|
||||
stream: false,
|
||||
output_format: "hex"
|
||||
}')
|
||||
|
||||
local response audio_hex
|
||||
if response="$(api_request POST t2a_v2 "$payload" 2>&1)"; then
|
||||
audio_hex="$(echo "$response" | jq -r '.data.audio // .extra_info.audio // empty')"
|
||||
if [[ -n "$audio_hex" ]]; then
|
||||
hex_to_file "$audio_hex" "$seg_output"
|
||||
segment_files+=("$seg_output")
|
||||
succeeded=$((succeeded + 1))
|
||||
echo " ✓ Saved: $seg_output"
|
||||
else
|
||||
failed=$((failed + 1))
|
||||
echo " ✗ Error: No audio data in response"
|
||||
if ! $continue_on_error; then break; fi
|
||||
fi
|
||||
else
|
||||
failed=$((failed + 1))
|
||||
echo " ✗ Error: $response"
|
||||
if ! $continue_on_error; then break; fi
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ ${#segment_files[@]} -eq 0 ]]; then
|
||||
echo "Error: No segments were generated successfully" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Merge segments
|
||||
ensure_dir "$(dirname "$output")"
|
||||
|
||||
if [[ ${#segment_files[@]} -eq 1 ]]; then
|
||||
cp "${segment_files[0]}" "$output"
|
||||
else
|
||||
_merge_audio_files "$output" "$crossfade" "$no_normalize" "${segment_files[@]}"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Audio saved to: $output"
|
||||
echo " Generated: $succeeded/$count segments"
|
||||
echo ""
|
||||
echo " Intermediate files in: $temp_dir"
|
||||
echo " Delete with: rm -rf $temp_dir"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: merge
|
||||
# ============================================================================
|
||||
cmd_merge() {
|
||||
local output="" format="mp3" crossfade=300 normalize=true
|
||||
local input_files=()
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--format) format="$2"; shift 2 ;;
|
||||
--crossfade) crossfade="$2"; shift 2 ;;
|
||||
--no-normalize) normalize=false; shift ;;
|
||||
*) input_files+=("$1"); shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ ${#input_files[@]} -lt 2 ]]; then
|
||||
echo "Error: At least 2 input files required" >&2
|
||||
exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: -o/--output is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
for f in "${input_files[@]}"; do
|
||||
if [[ ! -f "$f" ]]; then
|
||||
echo "Error: File not found: $f" >&2
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
|
||||
echo "Merging ${#input_files[@]} files..."
|
||||
local no_norm="false"
|
||||
$normalize || no_norm="true"
|
||||
_merge_audio_files "$output" "$crossfade" "$no_norm" "${input_files[@]}"
|
||||
echo "Merged audio saved to: $output"
|
||||
}
|
||||
|
||||
_merge_audio_files() {
|
||||
# _merge_audio_files OUTPUT CROSSFADE_MS NO_NORMALIZE FILE1 FILE2 ...
|
||||
local output="$1" crossfade_ms="$2" no_normalize="$3"
|
||||
shift 3
|
||||
local files=("$@")
|
||||
local n=${#files[@]}
|
||||
|
||||
ensure_dir "$(dirname "$output")"
|
||||
|
||||
if [[ "$crossfade_ms" -gt 0 && $n -ge 2 ]]; then
|
||||
# Use acrossfade filter for crossfade between segments
|
||||
local crossfade_sec
|
||||
crossfade_sec=$(echo "scale=3; $crossfade_ms / 1000" | bc)
|
||||
|
||||
local inputs=()
|
||||
local filter_parts=()
|
||||
|
||||
for ((i=0; i<n; i++)); do
|
||||
inputs+=(-i "${files[$i]}")
|
||||
filter_parts+=("[${i}:a]aresample=32000,aformat=sample_fmts=fltp:channel_layouts=mono[s${i}]")
|
||||
done
|
||||
|
||||
# Build acrossfade chain
|
||||
if [[ $n -eq 2 ]]; then
|
||||
filter_parts+=("[s0][s1]acrossfade=d=${crossfade_sec}[merged]")
|
||||
else
|
||||
filter_parts+=("[s0][s1]acrossfade=d=${crossfade_sec}[m1]")
|
||||
for ((i=2; i<n; i++)); do
|
||||
local prev="[m$((i-1))]"
|
||||
if [[ $i -eq $((n-1)) ]]; then
|
||||
filter_parts+=("${prev}[s${i}]acrossfade=d=${crossfade_sec}[merged]")
|
||||
else
|
||||
filter_parts+=("${prev}[s${i}]acrossfade=d=${crossfade_sec}[m${i}]")
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
local final_filter="[merged]aformat=sample_fmts=fltp"
|
||||
if [[ "$no_normalize" != "true" ]]; then
|
||||
final_filter+=",loudnorm=I=-16:TP=-1.5:LRA=11"
|
||||
fi
|
||||
final_filter+="[final]"
|
||||
filter_parts+=("$final_filter")
|
||||
|
||||
local filter_complex
|
||||
filter_complex="$(IFS=';'; echo "${filter_parts[*]}")"
|
||||
|
||||
if ffmpeg -y "${inputs[@]}" \
|
||||
-filter_complex "$filter_complex" \
|
||||
-map "[final]" \
|
||||
-ar 32000 -ac 1 -acodec libmp3lame \
|
||||
"$output" 2>/dev/null; then
|
||||
return 0
|
||||
fi
|
||||
echo " Crossfade merge failed, falling back to concat demuxer..." >&2
|
||||
fi
|
||||
|
||||
# Fallback: concat demuxer (no crossfade)
|
||||
local concat_file
|
||||
concat_file="$(mktemp /tmp/concat_XXXXXX.txt)"
|
||||
for f in "${files[@]}"; do
|
||||
echo "file '$(cd "$(dirname "$f")" && pwd)/$(basename "$f")'" >> "$concat_file"
|
||||
done
|
||||
|
||||
if [[ "$no_normalize" != "true" ]]; then
|
||||
local tmp_concat
|
||||
tmp_concat="$(mktemp /tmp/concat_out_XXXXXX.mp3)"
|
||||
ffmpeg -y -f concat -safe 0 -i "$concat_file" -c copy "$tmp_concat" 2>/dev/null
|
||||
ffmpeg -y -i "$tmp_concat" -af "loudnorm=I=-16:TP=-1.5:LRA=11" -acodec libmp3lame "$output" 2>/dev/null
|
||||
rm -f "$tmp_concat"
|
||||
else
|
||||
ffmpeg -y -f concat -safe 0 -i "$concat_file" -c copy "$output" 2>/dev/null
|
||||
fi
|
||||
|
||||
rm -f "$concat_file"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: convert
|
||||
# ============================================================================
|
||||
cmd_convert() {
|
||||
local input_file="" output="" format="mp3" sample_rate="" bitrate="" channels=""
|
||||
|
||||
if [[ $# -gt 0 && "$1" != -* ]]; then
|
||||
input_file="$1"; shift
|
||||
fi
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
--format) format="$2"; shift 2 ;;
|
||||
--sample-rate) sample_rate="$2"; shift 2 ;;
|
||||
--bitrate) bitrate="$2"; shift 2 ;;
|
||||
--channels) channels="$2"; shift 2 ;;
|
||||
*) [[ -z "$input_file" ]] && input_file="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$input_file" || ! -f "$input_file" ]]; then
|
||||
echo "Error: Input file not found: ${input_file:-<none>}" >&2
|
||||
exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: -o/--output is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
ensure_dir "$(dirname "$output")"
|
||||
|
||||
# Determine codec
|
||||
local codec="copy"
|
||||
case "$format" in
|
||||
mp3) codec="libmp3lame" ;;
|
||||
wav) codec="pcm_s16le" ;;
|
||||
flac) codec="flac" ;;
|
||||
ogg) codec="libvorbis" ;;
|
||||
aac) codec="aac" ;;
|
||||
m4a) codec="aac" ;;
|
||||
*) codec="copy" ;;
|
||||
esac
|
||||
|
||||
local args=(-y -i "$input_file" -acodec "$codec")
|
||||
[[ -n "$sample_rate" ]] && args+=(-ar "$sample_rate")
|
||||
[[ -n "$channels" ]] && args+=(-ac "$channels")
|
||||
[[ -n "$bitrate" ]] && args+=(-b:a "$bitrate")
|
||||
args+=("$output")
|
||||
|
||||
echo "Converting $input_file to $format..."
|
||||
ffmpeg "${args[@]}" 2>/dev/null
|
||||
echo "Converted audio saved to: $output"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Subcommand: check-env
|
||||
# ============================================================================
|
||||
cmd_check_env() {
|
||||
local check_script="$SCRIPT_DIR/../check_environment.sh"
|
||||
if [[ -f "$check_script" ]]; then
|
||||
bash "$check_script" "$@"
|
||||
else
|
||||
echo "check_environment.sh not found" >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main dispatcher
|
||||
# ============================================================================
|
||||
usage() {
|
||||
cat <<'EOF'
|
||||
MiniMax Voice CLI — Unified TTS interface
|
||||
|
||||
Usage:
|
||||
generate_voice.sh <command> [options]
|
||||
|
||||
Commands:
|
||||
tts Basic text-to-speech
|
||||
clone Clone voice from audio sample
|
||||
design Design voice from description
|
||||
list-voices List available voices
|
||||
validate Validate segments.json file
|
||||
generate Generate audio from segments.json
|
||||
merge Merge multiple audio files
|
||||
convert Convert audio format
|
||||
check-env Check environment setup
|
||||
|
||||
Examples:
|
||||
generate_voice.sh tts "Hello world" -o hello.mp3
|
||||
generate_voice.sh tts "你好" -v female-shaonv -o hello_cn.mp3
|
||||
generate_voice.sh clone my_voice.mp3 --voice-id my-custom-voice
|
||||
generate_voice.sh design "A warm female voice" --voice-id narrator-1
|
||||
generate_voice.sh list-voices
|
||||
generate_voice.sh validate segments.json --verbose
|
||||
generate_voice.sh generate segments.json -o output.mp3
|
||||
generate_voice.sh merge part1.mp3 part2.mp3 -o combined.mp3
|
||||
generate_voice.sh convert input.wav -o output.mp3
|
||||
generate_voice.sh check-env --test-api
|
||||
EOF
|
||||
}
|
||||
|
||||
main() {
|
||||
load_env
|
||||
|
||||
if [[ $# -eq 0 ]]; then
|
||||
usage
|
||||
exit 0
|
||||
fi
|
||||
|
||||
local command="$1"; shift
|
||||
|
||||
case "$command" in
|
||||
tts)
|
||||
check_api_key
|
||||
cmd_tts "$@"
|
||||
;;
|
||||
clone)
|
||||
check_api_key
|
||||
cmd_clone "$@"
|
||||
;;
|
||||
design)
|
||||
check_api_key
|
||||
cmd_design "$@"
|
||||
;;
|
||||
list-voices)
|
||||
check_api_key
|
||||
cmd_list_voices "$@"
|
||||
;;
|
||||
validate)
|
||||
cmd_validate "$@"
|
||||
;;
|
||||
generate)
|
||||
check_api_key
|
||||
cmd_generate "$@"
|
||||
;;
|
||||
merge)
|
||||
cmd_merge "$@"
|
||||
;;
|
||||
convert)
|
||||
cmd_convert "$@"
|
||||
;;
|
||||
check-env)
|
||||
cmd_check_env "$@"
|
||||
;;
|
||||
-h|--help|help)
|
||||
usage
|
||||
;;
|
||||
*)
|
||||
echo "Unknown command: $command" >&2
|
||||
usage >&2
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
main "$@"
|
||||
221
skills/minimax-multimodal-toolkit/scripts/video/add_bgm.sh
Executable file
221
skills/minimax-multimodal-toolkit/scripts/video/add_bgm.sh
Executable file
@@ -0,0 +1,221 @@
|
||||
#!/usr/bin/env bash
|
||||
# Add background music to a video file (pure bash)
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/video/add_bgm.sh --video input.mp4 --audio bgm.mp3 -o output.mp4
|
||||
# bash scripts/video/add_bgm.sh --video input.mp4 --generate-bgm --music-prompt "upbeat pop" -o output.mp4
|
||||
# bash scripts/video/add_bgm.sh --video input.mp4 --audio bgm.mp3 --replace-audio -o output.mp4
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
MUSIC_API_URL="${MINIMAX_API_HOST:-https://api.minimaxi.com}/v1/music_generation"
|
||||
|
||||
# ============================================================================
|
||||
# Common functions
|
||||
# ============================================================================
|
||||
|
||||
load_env() {
|
||||
local env_file
|
||||
for env_file in "$PROJECT_ROOT/.env" "$(pwd)/.env"; do
|
||||
if [[ -f "$env_file" ]]; then
|
||||
while IFS= read -r line || [[ -n "$line" ]]; do
|
||||
line="${line%%#*}"; line="$(echo "$line" | xargs)"
|
||||
[[ -z "$line" || "$line" != *=* ]] && continue
|
||||
local key="${line%%=*}" val="${line#*=}"
|
||||
key="$(echo "$key" | xargs)"; val="$(echo "$val" | xargs)"
|
||||
if [[ ${#val} -ge 2 ]]; then
|
||||
case "$val" in \"*\") val="${val:1:${#val}-2}" ;; \'*\') val="${val:1:${#val}-2}" ;; esac
|
||||
fi
|
||||
[[ -z "${!key:-}" ]] && export "$key=$val"
|
||||
done < "$env_file"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
get_video_duration() {
|
||||
ffprobe -v error -show_entries format=duration -of json "$1" 2>/dev/null | jq -r '.format.duration'
|
||||
}
|
||||
|
||||
video_has_audio() {
|
||||
local out
|
||||
out="$(ffprobe -v error -select_streams a -show_entries stream=codec_type -of csv=p=0 "$1" 2>/dev/null)"
|
||||
[[ "$out" == *audio* ]]
|
||||
}
|
||||
|
||||
generate_music() {
|
||||
local prompt="$1" output_path="$2" instrumental="${3:-false}"
|
||||
|
||||
local payload
|
||||
local effective_prompt="${prompt:-background music, cinematic, ambient}"
|
||||
|
||||
if [[ "$instrumental" == "true" ]]; then
|
||||
payload=$(jq -n \
|
||||
--arg p "$effective_prompt. pure music, no lyrics" \
|
||||
'{model: "music-2.5", prompt: $p, lyrics: "[intro] [outro]", output_format: "url"}')
|
||||
else
|
||||
payload=$(jq -n \
|
||||
--arg p "$effective_prompt" \
|
||||
'{model: "music-2.5", prompt: $p, lyrics: "[Intro]\nla da da\nla la la", output_format: "url"}')
|
||||
fi
|
||||
|
||||
echo "Generating ${instrumental:+instrumental }music..."
|
||||
echo " Prompt: $prompt"
|
||||
|
||||
local raw http_code response
|
||||
raw="$(curl -s -w "\n%{http_code}" -X POST "$MUSIC_API_URL" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time 300 -d "$payload")"
|
||||
http_code="${raw##*$'\n'}"; response="${raw%$'\n'*}"
|
||||
|
||||
[[ "$http_code" -ge 400 ]] 2>/dev/null && { echo "Error: Music API HTTP $http_code" >&2; return 1; }
|
||||
|
||||
local sc
|
||||
sc="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
[[ "$sc" != "0" && -n "$sc" ]] && { echo "Error: Music API error: $(echo "$response" | jq '.base_resp')" >&2; return 1; }
|
||||
|
||||
local audio_url
|
||||
audio_url="$(echo "$response" | jq -r '.data.audio_url // .data.audio // .data.audio_file.download_url // empty')"
|
||||
[[ -z "$audio_url" ]] && { echo "Error: No audio URL in music response" >&2; return 1; }
|
||||
|
||||
mkdir -p "$(dirname "$output_path")"
|
||||
|
||||
# Download with retry
|
||||
local attempt
|
||||
for attempt in 1 2 3; do
|
||||
if curl -s -o "$output_path" --max-time 120 "$audio_url" 2>/dev/null; then
|
||||
local size; size="$(wc -c < "$output_path" | tr -d ' ')"
|
||||
echo " Downloaded: $output_path ($size bytes)"
|
||||
return 0
|
||||
fi
|
||||
if [[ $attempt -lt 3 ]]; then
|
||||
local wait=$((2 ** attempt))
|
||||
echo " Download attempt $attempt failed. Retrying in ${wait}s..."
|
||||
sleep "$wait"
|
||||
fi
|
||||
done
|
||||
echo "Error: Download failed after 3 attempts" >&2
|
||||
return 1
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main
|
||||
# ============================================================================
|
||||
|
||||
main() {
|
||||
load_env
|
||||
|
||||
local video="" audio="" output=""
|
||||
local generate_bgm=false instrumental=false music_prompt=""
|
||||
local bgm_volume=0.3 fade_in=0 fade_out=0 replace_audio=false
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--video) video="$2"; shift 2 ;;
|
||||
--audio) audio="$2"; shift 2 ;;
|
||||
--generate-bgm) generate_bgm=true; shift ;;
|
||||
--instrumental) instrumental=true; shift ;;
|
||||
--music-prompt) music_prompt="$2"; shift 2 ;;
|
||||
--bgm-volume) bgm_volume="$2"; shift 2 ;;
|
||||
--fade-in) fade_in="$2"; shift 2 ;;
|
||||
--fade-out) fade_out="$2"; shift 2 ;;
|
||||
--replace-audio) replace_audio=true; shift ;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
cat <<'USAGE'
|
||||
Add Background Music to Video
|
||||
|
||||
Usage:
|
||||
add_bgm.sh --video INPUT --audio BGM -o OUTPUT
|
||||
add_bgm.sh --video INPUT --generate-bgm --music-prompt "style" -o OUTPUT
|
||||
|
||||
Options:
|
||||
--video FILE Input video file (required)
|
||||
--audio FILE Background music audio file
|
||||
--generate-bgm Generate BGM via MiniMax API
|
||||
--instrumental Make generated BGM instrumental
|
||||
--music-prompt TEXT Prompt for BGM generation
|
||||
--bgm-volume FLOAT BGM volume level (default: 0.3)
|
||||
--fade-in SECS BGM fade-in duration
|
||||
--fade-out SECS BGM fade-out duration
|
||||
--replace-audio Replace original audio instead of mixing
|
||||
-o, --output FILE Output video file (required)
|
||||
|
||||
Examples:
|
||||
add_bgm.sh --video input.mp4 --audio bgm.mp3 -o output.mp4
|
||||
add_bgm.sh --video input.mp4 --generate-bgm --music-prompt "upbeat pop" -o output.mp4
|
||||
add_bgm.sh --video input.mp4 --audio bgm.mp3 --replace-audio -o output.mp4
|
||||
USAGE
|
||||
exit 0
|
||||
;;
|
||||
*) echo "Unknown option: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$video" || ! -f "$video" ]]; then
|
||||
echo "Error: Video file not found: ${video:-<none>}" >&2; exit 1
|
||||
fi
|
||||
if [[ -z "$audio" && "$generate_bgm" != "true" ]]; then
|
||||
echo "Error: Provide --audio or --generate-bgm" >&2; exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: --output / -o is required" >&2; exit 1
|
||||
fi
|
||||
|
||||
local audio_path="$audio"
|
||||
|
||||
if $generate_bgm; then
|
||||
if [[ -z "${MINIMAX_API_KEY:-}" ]]; then
|
||||
echo "Error: MINIMAX_API_KEY not set." >&2; exit 1
|
||||
fi
|
||||
audio_path="${output%.*}_bgm.mp3"
|
||||
generate_music "$music_prompt" "$audio_path" "$instrumental" || exit 1
|
||||
fi
|
||||
|
||||
if [[ ! -f "$audio_path" ]]; then
|
||||
echo "Error: Audio file not found: $audio_path" >&2; exit 1
|
||||
fi
|
||||
|
||||
local duration
|
||||
duration="$(get_video_duration "$video")"
|
||||
echo "Video duration: $(printf '%.1f' "$duration")s"
|
||||
|
||||
mkdir -p "$(dirname "$output")"
|
||||
|
||||
local has_audio=false
|
||||
video_has_audio "$video" && has_audio=true
|
||||
|
||||
local bgm_filter="[1:a]volume=${bgm_volume}"
|
||||
[[ "$(echo "$fade_in > 0" | bc -l)" == "1" ]] && bgm_filter+=",afade=t=in:d=${fade_in}"
|
||||
if [[ "$(echo "$fade_out > 0" | bc -l)" == "1" ]]; then
|
||||
local fo_start
|
||||
fo_start="$(echo "$duration - $fade_out" | bc -l)"
|
||||
[[ "$(echo "$fo_start < 0" | bc -l)" == "1" ]] && fo_start=0
|
||||
bgm_filter+=",afade=t=out:st=${fo_start}:d=${fade_out}"
|
||||
fi
|
||||
|
||||
if $has_audio && ! $replace_audio; then
|
||||
bgm_filter+="[bgm];[0:a][bgm]amix=inputs=2:duration=first:dropout_transition=2[aout]"
|
||||
echo "Merging video + audio (mixing with original, bgm_volume=${bgm_volume})..."
|
||||
ffmpeg -y \
|
||||
-i "$video" -i "$audio_path" \
|
||||
-filter_complex "$bgm_filter" \
|
||||
-map 0:v -map "[aout]" \
|
||||
-c:v copy -c:a aac -shortest "$output" 2>/dev/null
|
||||
else
|
||||
bgm_filter+="[bgm]"
|
||||
echo "Merging video + audio (${replace_audio:+replacing original}${replace_audio:-no original audio}, bgm_volume=${bgm_volume})..."
|
||||
ffmpeg -y \
|
||||
-i "$video" -i "$audio_path" \
|
||||
-filter_complex "$bgm_filter" \
|
||||
-map 0:v -map "[bgm]" \
|
||||
-c:v copy -c:a aac -shortest "$output" 2>/dev/null
|
||||
fi
|
||||
|
||||
echo "Output saved: $output"
|
||||
echo "Done!"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
479
skills/minimax-multimodal-toolkit/scripts/video/generate_long_video.sh
Executable file
479
skills/minimax-multimodal-toolkit/scripts/video/generate_long_video.sh
Executable file
@@ -0,0 +1,479 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Long Video Generation CLI (pure bash)
|
||||
#
|
||||
# Generates multi-segment videos by chaining scenes together.
|
||||
# Each segment's last frame becomes the next segment's first frame.
|
||||
# Optionally adds AI-generated background music.
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/video/generate_long_video.sh \
|
||||
# --scenes "A sunrise" "Birds flying" "A calm lake" \
|
||||
# --output output/long_video.mp4
|
||||
#
|
||||
# bash scripts/video/generate_long_video.sh \
|
||||
# --scenes "A robot waking up" "The robot walks outside" \
|
||||
# --music-prompt "cinematic orchestral" \
|
||||
# --output output/robot_story.mp4
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
API_BASE="${MINIMAX_API_HOST:-https://api.minimaxi.com}/v1"
|
||||
MUSIC_API_URL="${API_BASE}/music_generation"
|
||||
POLL_INTERVAL=10
|
||||
MAX_WAIT_TIME=600
|
||||
REQUEST_TIMEOUT=60
|
||||
MAX_CONSECUTIVE_FAILURES=5
|
||||
|
||||
# ============================================================================
|
||||
# Common functions
|
||||
# ============================================================================
|
||||
|
||||
load_env() {
|
||||
local env_file
|
||||
for env_file in "$PROJECT_ROOT/.env" "$(pwd)/.env"; do
|
||||
if [[ -f "$env_file" ]]; then
|
||||
while IFS= read -r line || [[ -n "$line" ]]; do
|
||||
line="${line%%#*}"; line="$(echo "$line" | xargs)"
|
||||
[[ -z "$line" || "$line" != *=* ]] && continue
|
||||
local key="${line%%=*}" val="${line#*=}"
|
||||
key="$(echo "$key" | xargs)"; val="$(echo "$val" | xargs)"
|
||||
if [[ ${#val} -ge 2 ]]; then
|
||||
case "$val" in \"*\") val="${val:1:${#val}-2}" ;; \'*\') val="${val:1:${#val}-2}" ;; esac
|
||||
fi
|
||||
[[ -z "${!key:-}" ]] && export "$key=$val"
|
||||
done < "$env_file"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
check_api_key() {
|
||||
if [[ -z "${MINIMAX_API_KEY:-}" ]]; then
|
||||
echo "Error: MINIMAX_API_KEY not set." >&2; exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
image_to_data_url() {
|
||||
local path="$1"
|
||||
[[ -f "$path" ]] || { echo "Error: Image not found: $path" >&2; exit 1; }
|
||||
local mime; mime="$(file -b --mime-type "$path" 2>/dev/null)" || mime="image/jpeg"
|
||||
local b64; b64="$(base64 < "$path")"
|
||||
echo "data:${mime};base64,${b64}"
|
||||
}
|
||||
|
||||
resolve_image() {
|
||||
local input="$1"
|
||||
[[ -z "$input" ]] && return
|
||||
case "$input" in
|
||||
http://*|https://*|data:*) echo "$input" ;;
|
||||
*) image_to_data_url "$input" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Video API helpers (duplicated from generate_video.sh for standalone use)
|
||||
# ============================================================================
|
||||
|
||||
_create_task() {
|
||||
local payload="$1"
|
||||
local raw http_code response
|
||||
raw="$(curl -s -w "\n%{http_code}" -X POST "${API_BASE}/video_generation" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time "$REQUEST_TIMEOUT" -d "$payload")"
|
||||
http_code="${raw##*$'\n'}"; response="${raw%$'\n'*}"
|
||||
[[ "$http_code" -ge 400 ]] 2>/dev/null && { echo "Error: HTTP $http_code" >&2; echo "$response" >&2; exit 1; }
|
||||
local sc; sc="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
[[ "$sc" != "0" && -n "$sc" ]] && { echo "Error: $(echo "$response" | jq '.base_resp')" >&2; exit 1; }
|
||||
echo "$response" | jq -r '.task_id // empty'
|
||||
}
|
||||
|
||||
_poll_task() {
|
||||
local task_id="$1" start_time cf=0
|
||||
start_time="$(date +%s)"
|
||||
while true; do
|
||||
local now=$(($(date +%s) - start_time))
|
||||
[[ $now -gt $MAX_WAIT_TIME ]] && { echo "Error: Timeout" >&2; exit 1; }
|
||||
local raw http_code response
|
||||
if raw="$(curl -s -w "\n%{http_code}" -G "${API_BASE}/query/video_generation" \
|
||||
-d "task_id=$task_id" -H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
--max-time "$REQUEST_TIMEOUT" 2>/dev/null)"; then
|
||||
http_code="${raw##*$'\n'}"; response="${raw%$'\n'*}"; cf=0
|
||||
else
|
||||
cf=$((cf+1)); [[ $cf -ge $MAX_CONSECUTIVE_FAILURES ]] && { echo "Error: Too many failures" >&2; exit 1; }
|
||||
sleep "$POLL_INTERVAL"; continue
|
||||
fi
|
||||
local status; status="$(echo "$response" | jq -r '.status // "Unknown"')"
|
||||
echo " [${now}s] Status: $status" >&2
|
||||
[[ "$status" == "Success" ]] && { echo "$response" | jq -r '.file_id // empty'; return 0; }
|
||||
[[ "$status" == "Fail" || "$status" == "Failed" || "$status" == "Error" ]] && { echo "Error: Task failed" >&2; exit 1; }
|
||||
sleep "$POLL_INTERVAL"
|
||||
done
|
||||
}
|
||||
|
||||
_download_video() {
|
||||
local file_id="$1" output_path="$2"
|
||||
local raw; raw="$(curl -s -G "${API_BASE}/files/retrieve" -d "file_id=$file_id" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" --max-time "$REQUEST_TIMEOUT")"
|
||||
local dl_url; dl_url="$(echo "$raw" | jq -r '.file.download_url // empty')"
|
||||
[[ -z "$dl_url" ]] && { echo "Error: No download_url" >&2; exit 1; }
|
||||
mkdir -p "$(dirname "$output_path")"
|
||||
curl -s -o "$output_path" --max-time $((REQUEST_TIMEOUT * 3)) "$dl_url"
|
||||
echo " Video saved: $output_path ($(wc -c < "$output_path" | tr -d ' ') bytes)" >&2
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# FFmpeg helpers
|
||||
# ============================================================================
|
||||
|
||||
get_video_duration() {
|
||||
ffprobe -v error -show_entries format=duration -of json "$1" 2>/dev/null | jq -r '.format.duration'
|
||||
}
|
||||
|
||||
get_video_fps() {
|
||||
local fps_str
|
||||
fps_str="$(ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate -of csv=p=0 "$1" 2>/dev/null)" || { echo 25; return; }
|
||||
local num den
|
||||
num="${fps_str%/*}"; den="${fps_str#*/}"
|
||||
echo $(( (num + den/2) / den )) 2>/dev/null || echo 25
|
||||
}
|
||||
|
||||
video_has_audio() {
|
||||
local out
|
||||
out="$(ffprobe -v error -select_streams a -show_entries stream=codec_type -of csv=p=0 "$1" 2>/dev/null)"
|
||||
[[ "$out" == *audio* ]]
|
||||
}
|
||||
|
||||
extract_last_frame() {
|
||||
local video_path="$1" output_image="$2"
|
||||
# Try frame-accurate method with sseof fallback
|
||||
if ! ffmpeg -y -sseof -0.04 -i "$video_path" -frames:v 1 -q:v 2 "$output_image" 2>/dev/null; then
|
||||
echo "Warning: Could not extract last frame" >&2
|
||||
return 1
|
||||
fi
|
||||
[[ -f "$output_image" ]] || return 1
|
||||
echo " Extracted last frame: $output_image" >&2
|
||||
}
|
||||
|
||||
concatenate_videos() {
|
||||
local output_path="$1" crossfade="$2"
|
||||
shift 2
|
||||
local video_paths=("$@")
|
||||
local n=${#video_paths[@]}
|
||||
|
||||
if [[ $n -eq 1 ]]; then
|
||||
cp "${video_paths[0]}" "$output_path"
|
||||
return 0
|
||||
fi
|
||||
|
||||
local fps
|
||||
fps="$(get_video_fps "${video_paths[0]}")"
|
||||
local has_audio=true
|
||||
for vp in "${video_paths[@]}"; do
|
||||
video_has_audio "$vp" || { has_audio=false; break; }
|
||||
done
|
||||
|
||||
if [[ "$(echo "$crossfade > 0" | bc -l)" == "1" ]]; then
|
||||
# Get durations
|
||||
local durations=()
|
||||
for vp in "${video_paths[@]}"; do
|
||||
durations+=("$(get_video_duration "$vp")")
|
||||
done
|
||||
|
||||
# Build inputs
|
||||
local inputs=()
|
||||
for vp in "${video_paths[@]}"; do
|
||||
inputs+=(-i "$(cd "$(dirname "$vp")" && pwd)/$(basename "$vp")")
|
||||
done
|
||||
|
||||
# Calculate offsets
|
||||
local offsets=() cumulative=0
|
||||
for ((i=0; i<n-1; i++)); do
|
||||
local offset
|
||||
offset="$(echo "$cumulative + ${durations[$i]} - $crossfade" | bc -l)"
|
||||
offsets+=("$offset")
|
||||
cumulative="$offset"
|
||||
done
|
||||
|
||||
# Build filter
|
||||
local vf_parts=() af_parts=()
|
||||
if [[ $n -eq 2 ]]; then
|
||||
vf_parts+=("[0:v][1:v]xfade=transition=fade:duration=${crossfade}:offset=${offsets[0]}[vout]")
|
||||
$has_audio && af_parts+=("[0:a][1:a]acrossfade=d=${crossfade}:c1=tri:c2=tri[aout]")
|
||||
else
|
||||
vf_parts+=("[0:v][1:v]xfade=transition=fade:duration=${crossfade}:offset=${offsets[0]}[xv1]")
|
||||
$has_audio && af_parts+=("[0:a][1:a]acrossfade=d=${crossfade}:c1=tri:c2=tri[xa1]")
|
||||
for ((i=2; i<n; i++)); do
|
||||
local out_v="[xv${i}]" out_a="[xa${i}]"
|
||||
[[ $i -eq $((n-1)) ]] && { out_v="[vout]"; out_a="[aout]"; }
|
||||
vf_parts+=("[xv$((i-1))][${i}:v]xfade=transition=fade:duration=${crossfade}:offset=${offsets[$((i-1))]}${out_v}")
|
||||
$has_audio && af_parts+=("[xa$((i-1))][${i}:a]acrossfade=d=${crossfade}:c1=tri:c2=tri${out_a}")
|
||||
done
|
||||
fi
|
||||
|
||||
local filter_complex
|
||||
filter_complex="$(IFS=';'; echo "${vf_parts[*]}${af_parts[*]:+;${af_parts[*]}}")"
|
||||
|
||||
local cmd=(ffmpeg -y "${inputs[@]}" -filter_complex "$filter_complex" -map "[vout]")
|
||||
$has_audio && cmd+=(-map "[aout]")
|
||||
cmd+=(-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -r "$fps")
|
||||
$has_audio && cmd+=(-c:a aac -b:a 192k)
|
||||
cmd+=("$output_path")
|
||||
|
||||
if "${cmd[@]}" 2>/dev/null; then
|
||||
echo "Concatenated $n segments -> $output_path" >&2
|
||||
return 0
|
||||
fi
|
||||
echo " Crossfade failed, falling back to re-encode concat..." >&2
|
||||
fi
|
||||
|
||||
# Fallback: concat demuxer with re-encode
|
||||
local concat_file
|
||||
concat_file="$(mktemp /tmp/concat_XXXXXX.txt)"
|
||||
for vp in "${video_paths[@]}"; do
|
||||
echo "file '$(cd "$(dirname "$vp")" && pwd)/$(basename "$vp")'" >> "$concat_file"
|
||||
done
|
||||
ffmpeg -y -f concat -safe 0 -i "$concat_file" \
|
||||
-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -r "$fps" \
|
||||
-c:a aac -b:a 192k "$output_path" 2>/dev/null
|
||||
rm -f "$concat_file"
|
||||
echo "Concatenated $n segments -> $output_path" >&2
|
||||
}
|
||||
|
||||
merge_video_audio() {
|
||||
local video_path="$1" audio_path="$2" output_path="$3"
|
||||
local bgm_volume="${4:-0.3}" fade_in="${5:-0}" fade_out="${6:-0}"
|
||||
|
||||
local duration
|
||||
duration="$(get_video_duration "$video_path")"
|
||||
|
||||
local af="[1:a]volume=${bgm_volume}"
|
||||
[[ "$(echo "$fade_in > 0" | bc -l)" == "1" ]] && af+=",afade=t=in:d=${fade_in}"
|
||||
if [[ "$(echo "$fade_out > 0" | bc -l)" == "1" ]]; then
|
||||
local fo_start
|
||||
fo_start="$(echo "$duration - $fade_out" | bc -l)"
|
||||
[[ "$(echo "$fo_start < 0" | bc -l)" == "1" ]] && fo_start=0
|
||||
af+=",afade=t=out:st=${fo_start}:d=${fade_out}"
|
||||
fi
|
||||
af+="[bgm]"
|
||||
|
||||
mkdir -p "$(dirname "$output_path")"
|
||||
ffmpeg -y -i "$video_path" -i "$audio_path" \
|
||||
-filter_complex "$af" \
|
||||
-map 0:v -map "[bgm]" \
|
||||
-c:v copy -c:a aac -shortest "$output_path" 2>/dev/null
|
||||
|
||||
echo "Merged video+audio -> $output_path" >&2
|
||||
}
|
||||
|
||||
generate_music_instrumental() {
|
||||
local prompt="$1" output_path="$2"
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg p "${prompt:-cinematic background music, orchestral, ambient}. pure music, no lyrics" \
|
||||
'{model: "music-2.5", prompt: $p, lyrics: "[intro] [outro]", output_format: "url"}')
|
||||
|
||||
echo "Generating instrumental music: $prompt" >&2
|
||||
local raw http_code response
|
||||
raw="$(curl -s -w "\n%{http_code}" -X POST "$MUSIC_API_URL" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time 300 -d "$payload")"
|
||||
http_code="${raw##*$'\n'}"; response="${raw%$'\n'*}"
|
||||
[[ "$http_code" -ge 400 ]] 2>/dev/null && { echo "Error: Music API HTTP $http_code" >&2; return 1; }
|
||||
|
||||
local audio_url
|
||||
audio_url="$(echo "$response" | jq -r '.data.audio_url // .data.audio // .data.audio_file.download_url // empty')"
|
||||
[[ -z "$audio_url" ]] && { echo "Error: No audio URL in music response" >&2; return 1; }
|
||||
|
||||
mkdir -p "$(dirname "$output_path")"
|
||||
curl -s -o "$output_path" --max-time 120 "$audio_url"
|
||||
echo " Music saved: $output_path" >&2
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main
|
||||
# ============================================================================
|
||||
|
||||
main() {
|
||||
load_env
|
||||
check_api_key
|
||||
|
||||
local scenes=() model="" segment_duration=10 resolution="768P"
|
||||
local first_frame="" subject_reference="" crossfade=0.5
|
||||
local music_prompt="" bgm_volume=0.3 fade_in=0 fade_out=0
|
||||
local output=""
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--scenes)
|
||||
shift
|
||||
while [[ $# -gt 0 && "$1" != --* ]]; do
|
||||
scenes+=("$1"); shift
|
||||
done
|
||||
;;
|
||||
--model) model="$2"; shift 2 ;;
|
||||
--segment-duration) segment_duration="$2"; shift 2 ;;
|
||||
--resolution) resolution="$2"; shift 2 ;;
|
||||
--first-frame) first_frame="$2"; shift 2 ;;
|
||||
--subject-reference) subject_reference="$2"; shift 2 ;;
|
||||
--crossfade) crossfade="$2"; shift 2 ;;
|
||||
--music-prompt) music_prompt="$2"; shift 2 ;;
|
||||
--bgm-volume) bgm_volume="$2"; shift 2 ;;
|
||||
--fade-in) fade_in="$2"; shift 2 ;;
|
||||
--fade-out) fade_out="$2"; shift 2 ;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
cat <<'USAGE'
|
||||
MiniMax Long Video Generation CLI
|
||||
|
||||
Usage:
|
||||
generate_long_video.sh --scenes "scene1" "scene2" ... -o OUTPUT
|
||||
|
||||
Options:
|
||||
--scenes TEXT... Scene prompts (2+ required)
|
||||
--model MODEL Model name (default: auto)
|
||||
--segment-duration SECS Duration per segment (default: 10)
|
||||
--resolution RES Resolution: 768P, 1080P (default: 768P)
|
||||
--first-frame FILE First frame for scene 1 (local file or URL)
|
||||
--subject-reference FILE Subject reference image
|
||||
--crossfade SECS Crossfade duration between scenes (default: 0.5)
|
||||
--music-prompt TEXT Generate BGM with this prompt
|
||||
--bgm-volume FLOAT BGM volume level (default: 0.3)
|
||||
--fade-in SECS BGM fade-in duration
|
||||
--fade-out SECS BGM fade-out duration
|
||||
-o, --output FILE Output video file (required)
|
||||
|
||||
Examples:
|
||||
generate_long_video.sh --scenes "A sunrise" "Birds flying" "Sunset" -o long.mp4
|
||||
generate_long_video.sh --scenes "Scene 1" "Scene 2" --crossfade 1 --music-prompt "cinematic" -o movie.mp4
|
||||
USAGE
|
||||
exit 0
|
||||
;;
|
||||
*) echo "Unknown option: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ ${#scenes[@]} -eq 0 ]]; then
|
||||
echo "Error: --scenes is required" >&2; exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: --output / -o is required" >&2; exit 1
|
||||
fi
|
||||
|
||||
local output_dir
|
||||
output_dir="$(dirname "$output")"
|
||||
mkdir -p "$output_dir"
|
||||
local tmpdir="$output_dir/tmp"
|
||||
mkdir -p "$tmpdir"
|
||||
echo "Temp directory: $tmpdir"
|
||||
|
||||
local segment_paths=()
|
||||
local current_first_frame="$first_frame"
|
||||
|
||||
echo "=== Generating ${#scenes[@]} video segments ==="
|
||||
echo ""
|
||||
|
||||
for i in "${!scenes[@]}"; do
|
||||
local scene="${scenes[$i]}"
|
||||
echo "--- Segment $((i+1))/${#scenes[@]} ---"
|
||||
echo " Prompt: $scene"
|
||||
|
||||
local seg_output="$tmpdir/segment_$(printf '%03d' "$i").mp4"
|
||||
|
||||
# Determine mode
|
||||
local seg_mode="t2v"
|
||||
[[ -n "$current_first_frame" ]] && seg_mode="i2v"
|
||||
[[ -n "$subject_reference" && -z "$current_first_frame" ]] && seg_mode="ref"
|
||||
|
||||
# Determine model
|
||||
local seg_model="$model"
|
||||
if [[ -z "$seg_model" ]]; then
|
||||
case "$seg_mode" in
|
||||
t2v|i2v) seg_model="MiniMax-Hailuo-2.3" ;;
|
||||
ref) seg_model="S2V-01" ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# Build payload
|
||||
local payload
|
||||
payload=$(jq -n \
|
||||
--arg m "$seg_model" \
|
||||
--arg p "$scene" \
|
||||
--argjson d "$segment_duration" \
|
||||
--arg r "$resolution" \
|
||||
'{model: $m, prompt: $p, duration: $d, resolution: $r}')
|
||||
|
||||
if [[ "$seg_mode" == "i2v" ]]; then
|
||||
local ff_url; ff_url="$(resolve_image "$current_first_frame")"
|
||||
payload=$(echo "$payload" | jq --arg ff "$ff_url" '. + {first_frame_image: $ff, prompt_optimizer: false}')
|
||||
elif [[ "$seg_mode" == "ref" ]]; then
|
||||
local si_url; si_url="$(resolve_image "$subject_reference")"
|
||||
payload=$(echo "$payload" | jq --arg si "$si_url" '. + {subject_reference: [{type: "character", image: [$si]}]}')
|
||||
fi
|
||||
|
||||
# Generate segment
|
||||
local task_id file_id
|
||||
if task_id="$(_create_task "$payload")" && [[ -n "$task_id" ]]; then
|
||||
echo " Task created: $task_id"
|
||||
if file_id="$(_poll_task "$task_id")" && [[ -n "$file_id" ]]; then
|
||||
_download_video "$file_id" "$seg_output"
|
||||
segment_paths+=("$seg_output")
|
||||
|
||||
# Extract last frame for next segment
|
||||
local last_frame_path="$tmpdir/last_frame_$(printf '%03d' "$i").jpg"
|
||||
if extract_last_frame "$seg_output" "$last_frame_path"; then
|
||||
current_first_frame="$last_frame_path"
|
||||
else
|
||||
current_first_frame=""
|
||||
fi
|
||||
else
|
||||
echo " Error: Polling failed for segment $((i+1))" >&2
|
||||
[[ ${#segment_paths[@]} -eq 0 ]] && exit 1
|
||||
break
|
||||
fi
|
||||
else
|
||||
echo " Error generating segment $((i+1))" >&2
|
||||
[[ ${#segment_paths[@]} -eq 0 ]] && exit 1
|
||||
break
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ ${#segment_paths[@]} -eq 0 ]]; then
|
||||
echo "Error: No segments were generated." >&2; exit 1
|
||||
fi
|
||||
|
||||
# Concatenate
|
||||
local final_video="$output"
|
||||
[[ -n "$music_prompt" ]] && final_video="$tmpdir/concatenated.mp4"
|
||||
|
||||
if [[ ${#segment_paths[@]} -eq 1 ]]; then
|
||||
cp "${segment_paths[0]}" "$final_video"
|
||||
else
|
||||
concatenate_videos "$final_video" "$crossfade" "${segment_paths[@]}"
|
||||
fi
|
||||
|
||||
# Add BGM if requested
|
||||
if [[ -n "$music_prompt" ]]; then
|
||||
echo ""
|
||||
echo "--- Generating background music ---"
|
||||
local music_path="$tmpdir/bgm.mp3"
|
||||
if generate_music_instrumental "$music_prompt" "$music_path"; then
|
||||
merge_video_audio "$final_video" "$music_path" "$output" "$bgm_volume" "$fade_in" "$fade_out" || {
|
||||
echo "Warning: Failed to add BGM, using video without music" >&2
|
||||
[[ "$final_video" != "$output" ]] && cp "$final_video" "$output"
|
||||
}
|
||||
else
|
||||
echo "Warning: Failed to generate BGM" >&2
|
||||
[[ "$final_video" != "$output" ]] && cp "$final_video" "$output"
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== Done! Output: $output ==="
|
||||
echo " Intermediate files in: $tmpdir"
|
||||
echo " Delete with: rm -rf $tmpdir"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
216
skills/minimax-multimodal-toolkit/scripts/video/generate_template_video.sh
Executable file
216
skills/minimax-multimodal-toolkit/scripts/video/generate_template_video.sh
Executable file
@@ -0,0 +1,216 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Template Video Generation CLI (pure bash)
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/video/generate_template_video.sh \
|
||||
# --template-id T00001 \
|
||||
# --media image1.jpg image2.jpg \
|
||||
# --text "Title" "Subtitle" \
|
||||
# -o output/template_video.mp4
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
API_BASE="${MINIMAX_API_HOST:-https://api.minimaxi.com}/v1"
|
||||
TEMPLATE_URL="${API_BASE}/video_template_generation"
|
||||
QUERY_URL="${API_BASE}/query/video_template_generation"
|
||||
|
||||
POLL_INTERVAL=10
|
||||
MAX_WAIT_TIME=600
|
||||
REQUEST_TIMEOUT=60
|
||||
MAX_CONSECUTIVE_FAILURES=5
|
||||
|
||||
# ============================================================================
|
||||
# Common functions
|
||||
# ============================================================================
|
||||
|
||||
load_env() {
|
||||
local env_file
|
||||
for env_file in "$PROJECT_ROOT/.env" "$(pwd)/.env"; do
|
||||
if [[ -f "$env_file" ]]; then
|
||||
while IFS= read -r line || [[ -n "$line" ]]; do
|
||||
line="${line%%#*}"; line="$(echo "$line" | xargs)"
|
||||
[[ -z "$line" || "$line" != *=* ]] && continue
|
||||
local key="${line%%=*}" val="${line#*=}"
|
||||
key="$(echo "$key" | xargs)"; val="$(echo "$val" | xargs)"
|
||||
if [[ ${#val} -ge 2 ]]; then
|
||||
case "$val" in \"*\") val="${val:1:${#val}-2}" ;; \'*\') val="${val:1:${#val}-2}" ;; esac
|
||||
fi
|
||||
[[ -z "${!key:-}" ]] && export "$key=$val"
|
||||
done < "$env_file"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
check_api_key() {
|
||||
if [[ -z "${MINIMAX_API_KEY:-}" ]]; then
|
||||
echo "Error: MINIMAX_API_KEY not set." >&2; exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
resolve_media_input() {
|
||||
local value="$1"
|
||||
case "$value" in
|
||||
http://*|https://*|data:*) echo "$value"; return ;;
|
||||
esac
|
||||
[[ -f "$value" ]] || { echo "Error: Media file not found: $value" >&2; exit 1; }
|
||||
local mime; mime="$(file -b --mime-type "$value" 2>/dev/null)" || mime="application/octet-stream"
|
||||
local b64; b64="$(base64 < "$value")"
|
||||
echo "data:${mime};base64,${b64}"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main
|
||||
# ============================================================================
|
||||
|
||||
main() {
|
||||
load_env
|
||||
check_api_key
|
||||
|
||||
local template_id="" output=""
|
||||
local media_inputs=() text_inputs=()
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--template-id) template_id="$2"; shift 2 ;;
|
||||
--media)
|
||||
shift
|
||||
while [[ $# -gt 0 && "$1" != --* ]]; do
|
||||
media_inputs+=("$1"); shift
|
||||
done
|
||||
;;
|
||||
--text)
|
||||
shift
|
||||
while [[ $# -gt 0 && "$1" != --* ]]; do
|
||||
text_inputs+=("$1"); shift
|
||||
done
|
||||
;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
cat <<'USAGE'
|
||||
MiniMax Template Video Generation CLI
|
||||
|
||||
Usage:
|
||||
generate_template_video.sh --template-id ID [--media FILE...] [--text TEXT...] -o OUTPUT
|
||||
|
||||
Options:
|
||||
--template-id ID Template ID (required)
|
||||
--media FILE... Media inputs (local files or URLs)
|
||||
--text TEXT... Text inputs for template slots
|
||||
-o, --output FILE Output video file (required)
|
||||
|
||||
Examples:
|
||||
generate_template_video.sh --template-id T00001 --media image1.jpg image2.jpg --text "Title" "Subtitle" -o video.mp4
|
||||
USAGE
|
||||
exit 0
|
||||
;;
|
||||
*) echo "Unknown option: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$template_id" ]]; then
|
||||
echo "Error: --template-id is required" >&2; exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: --output / -o is required" >&2; exit 1
|
||||
fi
|
||||
|
||||
# Build payload
|
||||
local payload
|
||||
payload=$(jq -n --arg tid "$template_id" '{template_id: $tid}')
|
||||
|
||||
# Add media inputs
|
||||
if [[ ${#media_inputs[@]} -gt 0 ]]; then
|
||||
local media_json="[]"
|
||||
for i in "${!media_inputs[@]}"; do
|
||||
local resolved
|
||||
resolved="$(resolve_media_input "${media_inputs[$i]}")"
|
||||
media_json=$(echo "$media_json" | jq --arg url "$resolved" '. + [{value: $url}]')
|
||||
echo " Media [$i]: ${media_inputs[$i]}"
|
||||
done
|
||||
payload=$(echo "$payload" | jq --argjson mi "$media_json" '. + {media_inputs: $mi}')
|
||||
fi
|
||||
|
||||
# Add text inputs
|
||||
if [[ ${#text_inputs[@]} -gt 0 ]]; then
|
||||
local text_json="[]"
|
||||
for i in "${!text_inputs[@]}"; do
|
||||
text_json=$(echo "$text_json" | jq --arg t "${text_inputs[$i]}" '. + [{value: $t}]')
|
||||
echo " Text [$i]: ${text_inputs[$i]}"
|
||||
done
|
||||
payload=$(echo "$payload" | jq --argjson ti "$text_json" '. + {text_inputs: $ti}')
|
||||
fi
|
||||
|
||||
# Create task
|
||||
echo "Creating template video task (template: $template_id)..."
|
||||
local raw http_code response
|
||||
raw="$(curl -s -w "\n%{http_code}" -X POST "$TEMPLATE_URL" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time "$REQUEST_TIMEOUT" -d "$payload")"
|
||||
http_code="${raw##*$'\n'}"; response="${raw%$'\n'*}"
|
||||
|
||||
[[ "$http_code" -ge 400 ]] 2>/dev/null && { echo "Error: HTTP $http_code" >&2; echo "$response" >&2; exit 1; }
|
||||
|
||||
local sc
|
||||
sc="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
[[ "$sc" != "0" && -n "$sc" ]] && { echo "Error: $(echo "$response" | jq '.base_resp')" >&2; exit 1; }
|
||||
|
||||
local task_id
|
||||
task_id="$(echo "$response" | jq -r '.task_id // empty')"
|
||||
[[ -z "$task_id" ]] && { echo "Error: No task_id in response" >&2; exit 1; }
|
||||
echo "Task created: $task_id"
|
||||
|
||||
# Poll task
|
||||
echo "Polling task $task_id..."
|
||||
local start_time cf=0
|
||||
start_time="$(date +%s)"
|
||||
local video_url=""
|
||||
|
||||
while true; do
|
||||
local elapsed=$(( $(date +%s) - start_time ))
|
||||
[[ $elapsed -gt $MAX_WAIT_TIME ]] && { echo "Error: Timeout" >&2; exit 1; }
|
||||
|
||||
local poll_raw poll_code poll_resp
|
||||
if poll_raw="$(curl -s -w "\n%{http_code}" -G "$QUERY_URL" \
|
||||
-d "task_id=$task_id" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
--max-time "$REQUEST_TIMEOUT" 2>/dev/null)"; then
|
||||
poll_code="${poll_raw##*$'\n'}"; poll_resp="${poll_raw%$'\n'*}"; cf=0
|
||||
else
|
||||
cf=$((cf+1))
|
||||
echo " Poll error ($cf/$MAX_CONSECUTIVE_FAILURES)"
|
||||
[[ $cf -ge $MAX_CONSECUTIVE_FAILURES ]] && { echo "Error: Too many failures" >&2; exit 1; }
|
||||
sleep "$POLL_INTERVAL"; continue
|
||||
fi
|
||||
|
||||
local status
|
||||
status="$(echo "$poll_resp" | jq -r '.status // "Unknown"')"
|
||||
echo " [${elapsed}s] Status: $status"
|
||||
|
||||
if [[ "$status" == "Success" ]]; then
|
||||
local video_url
|
||||
video_url="$(echo "$poll_resp" | jq -r '.video_url // empty')"
|
||||
[[ -z "$video_url" ]] && { echo "Error: No video_url in response" >&2; exit 1; }
|
||||
break
|
||||
fi
|
||||
|
||||
[[ "$status" == "Fail" || "$status" == "Failed" || "$status" == "Error" ]] && {
|
||||
echo "Error: Task failed: $(echo "$poll_resp" | jq -r '.base_resp.status_msg // "Unknown"')" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
sleep "$POLL_INTERVAL"
|
||||
done
|
||||
|
||||
# Download video directly from video_url
|
||||
echo "Downloading video..."
|
||||
mkdir -p "$(dirname "$output")"
|
||||
curl -s -o "$output" --max-time $((REQUEST_TIMEOUT * 3)) "$video_url"
|
||||
local size; size="$(wc -c < "$output" | tr -d ' ')"
|
||||
echo "Video saved to: $output ($size bytes)"
|
||||
echo "Done!"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
329
skills/minimax-multimodal-toolkit/scripts/video/generate_video.sh
Executable file
329
skills/minimax-multimodal-toolkit/scripts/video/generate_video.sh
Executable file
@@ -0,0 +1,329 @@
|
||||
#!/usr/bin/env bash
|
||||
# MiniMax Video Generation CLI (pure bash)
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/video/generate_video.sh --mode t2v --prompt "A cat playing piano" -o output/cat.mp4
|
||||
# bash scripts/video/generate_video.sh --mode i2v --prompt "Gentle breeze" --first-frame image.jpg -o output/anim.mp4
|
||||
# bash scripts/video/generate_video.sh --mode sef --first-frame start.jpg --last-frame end.jpg -o output/sef.mp4
|
||||
# bash scripts/video/generate_video.sh --mode ref --prompt "Person dancing" --subject-image person.jpg -o output/ref.mp4
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
API_BASE="${MINIMAX_API_HOST:-https://api.minimaxi.com}/v1"
|
||||
POLL_INTERVAL=10
|
||||
MAX_WAIT_TIME=600
|
||||
REQUEST_TIMEOUT=60
|
||||
MAX_CONSECUTIVE_FAILURES=5
|
||||
|
||||
# ============================================================================
|
||||
# Common functions
|
||||
# ============================================================================
|
||||
|
||||
load_env() {
|
||||
local env_file
|
||||
for env_file in "$PROJECT_ROOT/.env" "$(pwd)/.env"; do
|
||||
if [[ -f "$env_file" ]]; then
|
||||
while IFS= read -r line || [[ -n "$line" ]]; do
|
||||
line="${line%%#*}"; line="$(echo "$line" | xargs)"
|
||||
[[ -z "$line" || "$line" != *=* ]] && continue
|
||||
local key="${line%%=*}" val="${line#*=}"
|
||||
key="$(echo "$key" | xargs)"; val="$(echo "$val" | xargs)"
|
||||
if [[ ${#val} -ge 2 ]]; then
|
||||
case "$val" in \"*\") val="${val:1:${#val}-2}" ;; \'*\') val="${val:1:${#val}-2}" ;; esac
|
||||
fi
|
||||
[[ -z "${!key:-}" ]] && export "$key=$val"
|
||||
done < "$env_file"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
check_api_key() {
|
||||
if [[ -z "${MINIMAX_API_KEY:-}" ]]; then
|
||||
echo "Error: MINIMAX_API_KEY environment variable is not set." >&2; exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
image_to_data_url() {
|
||||
local path="$1"
|
||||
[[ -f "$path" ]] || { echo "Error: Image not found: $path" >&2; exit 1; }
|
||||
local mime
|
||||
mime="$(file -b --mime-type "$path" 2>/dev/null)" || mime="image/jpeg"
|
||||
local b64
|
||||
b64="$(base64 < "$path")"
|
||||
echo "data:${mime};base64,${b64}"
|
||||
}
|
||||
|
||||
resolve_image() {
|
||||
local input="$1"
|
||||
[[ -z "$input" ]] && return
|
||||
case "$input" in
|
||||
http://*|https://*|data:*) echo "$input" ;;
|
||||
*) image_to_data_url "$input" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Video generation functions
|
||||
# ============================================================================
|
||||
|
||||
create_task() {
|
||||
local payload="$1"
|
||||
echo "Creating video generation task..." >&2
|
||||
local raw_output http_code response
|
||||
raw_output="$(curl -s -w "\n%{http_code}" \
|
||||
-X POST "${API_BASE}/video_generation" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time "$REQUEST_TIMEOUT" \
|
||||
-d "$payload")"
|
||||
http_code="${raw_output##*$'\n'}"
|
||||
response="${raw_output%$'\n'*}"
|
||||
|
||||
if [[ "$http_code" -ge 400 ]] 2>/dev/null; then
|
||||
echo "Error: API returned HTTP $http_code" >&2; echo "$response" >&2; exit 1
|
||||
fi
|
||||
|
||||
local sc
|
||||
sc="$(echo "$response" | jq -r '.base_resp.status_code // 0')" 2>/dev/null || true
|
||||
if [[ "$sc" != "0" && -n "$sc" ]]; then
|
||||
echo "Error: API error: $(echo "$response" | jq '.base_resp')" >&2; exit 1
|
||||
fi
|
||||
|
||||
local task_id
|
||||
task_id="$(echo "$response" | jq -r '.task_id // empty')"
|
||||
if [[ -z "$task_id" ]]; then
|
||||
echo "Error: No task_id in response" >&2; echo "$response" >&2; exit 1
|
||||
fi
|
||||
|
||||
echo "Task created: $task_id" >&2
|
||||
echo "$task_id"
|
||||
}
|
||||
|
||||
poll_task() {
|
||||
local task_id="$1"
|
||||
echo "Polling task $task_id..." >&2
|
||||
local start_time consecutive_failures=0
|
||||
start_time="$(date +%s)"
|
||||
|
||||
while true; do
|
||||
local now elapsed
|
||||
now="$(date +%s)"
|
||||
elapsed=$((now - start_time))
|
||||
if [[ $elapsed -gt $MAX_WAIT_TIME ]]; then
|
||||
echo "Error: Task $task_id timed out after ${MAX_WAIT_TIME}s" >&2; exit 1
|
||||
fi
|
||||
|
||||
local raw_output http_code response
|
||||
if raw_output="$(curl -s -w "\n%{http_code}" \
|
||||
-G "${API_BASE}/query/video_generation" \
|
||||
-d "task_id=$task_id" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
--max-time "$REQUEST_TIMEOUT" 2>/dev/null)"; then
|
||||
http_code="${raw_output##*$'\n'}"
|
||||
response="${raw_output%$'\n'*}"
|
||||
consecutive_failures=0
|
||||
else
|
||||
consecutive_failures=$((consecutive_failures + 1))
|
||||
echo " Poll error ($consecutive_failures/$MAX_CONSECUTIVE_FAILURES)" >&2
|
||||
if [[ $consecutive_failures -ge $MAX_CONSECUTIVE_FAILURES ]]; then
|
||||
echo "Error: Too many consecutive poll failures" >&2; exit 1
|
||||
fi
|
||||
sleep "$POLL_INTERVAL"; continue
|
||||
fi
|
||||
|
||||
local status
|
||||
status="$(echo "$response" | jq -r '.status // "Unknown"')"
|
||||
echo " [${elapsed}s] Status: $status" >&2
|
||||
|
||||
if [[ "$status" == "Success" ]]; then
|
||||
local file_id
|
||||
file_id="$(echo "$response" | jq -r '.file_id // empty')"
|
||||
if [[ -z "$file_id" ]]; then
|
||||
echo "Error: Task succeeded but no file_id" >&2; exit 1
|
||||
fi
|
||||
echo "$file_id"
|
||||
return 0
|
||||
fi
|
||||
|
||||
if [[ "$status" == "Fail" || "$status" == "Failed" || "$status" == "Error" ]]; then
|
||||
local err_msg
|
||||
err_msg="$(echo "$response" | jq -r '.base_resp.status_msg // "Unknown error"')"
|
||||
echo "Error: Task failed: $err_msg" >&2; exit 1
|
||||
fi
|
||||
|
||||
sleep "$POLL_INTERVAL"
|
||||
done
|
||||
}
|
||||
|
||||
download_video() {
|
||||
local file_id="$1" output_path="$2"
|
||||
echo "Retrieving file $file_id..." >&2
|
||||
|
||||
local raw_output http_code response
|
||||
raw_output="$(curl -s -w "\n%{http_code}" \
|
||||
-G "${API_BASE}/files/retrieve" \
|
||||
-d "file_id=$file_id" \
|
||||
-H "Authorization: Bearer ${MINIMAX_API_KEY}" \
|
||||
--max-time "$REQUEST_TIMEOUT")"
|
||||
http_code="${raw_output##*$'\n'}"
|
||||
response="${raw_output%$'\n'*}"
|
||||
|
||||
local dl_url
|
||||
dl_url="$(echo "$response" | jq -r '.file.download_url // empty')"
|
||||
if [[ -z "$dl_url" ]]; then
|
||||
echo "Error: No download_url in file response" >&2; exit 1
|
||||
fi
|
||||
|
||||
echo "Downloading video..." >&2
|
||||
mkdir -p "$(dirname "$output_path")"
|
||||
curl -s -o "$output_path" --max-time $((REQUEST_TIMEOUT * 3)) "$dl_url"
|
||||
local size
|
||||
size="$(wc -c < "$output_path" | tr -d ' ')"
|
||||
echo "Video saved to: $output_path ($size bytes)" >&2
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Main
|
||||
# ============================================================================
|
||||
|
||||
main() {
|
||||
load_env
|
||||
check_api_key
|
||||
|
||||
local mode="" prompt="" model="" duration=10 resolution="768P"
|
||||
local first_frame="" last_frame="" subject_image=""
|
||||
local prompt_optimizer="" fast_pretreatment="" callback_url="" aigc_watermark=""
|
||||
local output=""
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--mode) mode="$2"; shift 2 ;;
|
||||
--prompt) prompt="$2"; shift 2 ;;
|
||||
--model) model="$2"; shift 2 ;;
|
||||
--duration) duration="$2"; shift 2 ;;
|
||||
--resolution) resolution="$2"; shift 2 ;;
|
||||
--first-frame) first_frame="$2"; shift 2 ;;
|
||||
--last-frame) last_frame="$2"; shift 2 ;;
|
||||
--subject-image) subject_image="$2"; shift 2 ;;
|
||||
--prompt-optimizer) prompt_optimizer="$2"; shift 2 ;;
|
||||
--fast-pretreatment) fast_pretreatment="$2"; shift 2 ;;
|
||||
--callback-url) callback_url="$2"; shift 2 ;;
|
||||
--aigc-watermark) aigc_watermark="$2"; shift 2 ;;
|
||||
-o|--output) output="$2"; shift 2 ;;
|
||||
-h|--help)
|
||||
cat <<'USAGE'
|
||||
MiniMax Video Generation CLI
|
||||
|
||||
Usage:
|
||||
generate_video.sh --mode MODE [options] -o OUTPUT
|
||||
|
||||
Modes:
|
||||
t2v Text-to-video
|
||||
i2v Image-to-video (requires --first-frame)
|
||||
sef Start-end frame (requires --first-frame and --last-frame)
|
||||
ref Subject reference (requires --subject-image)
|
||||
|
||||
Options:
|
||||
--mode MODE Generation mode: t2v, i2v, sef, ref (required)
|
||||
--prompt TEXT Text prompt describing the video
|
||||
--model MODEL Model name (default: T2V-01)
|
||||
--first-frame FILE First frame image (local file or URL)
|
||||
--last-frame FILE Last frame image (local file or URL)
|
||||
--subject-image FILE Subject reference image (local file or URL)
|
||||
-o, --output FILE Output video file (required)
|
||||
|
||||
Examples:
|
||||
generate_video.sh --mode t2v --prompt "A cat playing piano" -o cat.mp4
|
||||
generate_video.sh --mode i2v --prompt "Gentle breeze" --first-frame photo.jpg -o anim.mp4
|
||||
generate_video.sh --mode sef --first-frame start.jpg --last-frame end.jpg -o sef.mp4
|
||||
generate_video.sh --mode ref --prompt "Person dancing" --subject-image person.jpg -o ref.mp4
|
||||
USAGE
|
||||
exit 0
|
||||
;;
|
||||
*) echo "Unknown option: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$mode" ]]; then
|
||||
echo "Error: --mode is required (t2v, i2v, sef, ref)" >&2; exit 1
|
||||
fi
|
||||
if [[ -z "$output" ]]; then
|
||||
echo "Error: --output / -o is required" >&2; exit 1
|
||||
fi
|
||||
|
||||
# Default model per mode
|
||||
if [[ -z "$model" ]]; then
|
||||
case "$mode" in
|
||||
t2v) model="MiniMax-Hailuo-2.3" ;;
|
||||
i2v) model="MiniMax-Hailuo-2.3" ;;
|
||||
sef) model="MiniMax-Hailuo-02" ;;
|
||||
ref) model="S2V-01" ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# Build payload
|
||||
local payload
|
||||
payload=$(jq -n --arg m "$model" '{model: $m}')
|
||||
|
||||
[[ -n "$prompt" ]] && payload=$(echo "$payload" | jq --arg p "$prompt" '. + {prompt: $p}')
|
||||
payload=$(echo "$payload" | jq --argjson d "$duration" '. + {duration: $d}')
|
||||
payload=$(echo "$payload" | jq --arg r "$resolution" '. + {resolution: $r}')
|
||||
|
||||
[[ -n "$prompt_optimizer" ]] && payload=$(echo "$payload" | jq --argjson po "$(echo "$prompt_optimizer" | tr '[:upper:]' '[:lower:]' | jq -R 'test("true")')" '. + {prompt_optimizer: $po}')
|
||||
[[ -n "$callback_url" ]] && payload=$(echo "$payload" | jq --arg cu "$callback_url" '. + {callback_url: $cu}')
|
||||
[[ -n "$aigc_watermark" ]] && payload=$(echo "$payload" | jq --argjson aw "$aigc_watermark" '. + {aigc_watermark: $aw}')
|
||||
|
||||
case "$mode" in
|
||||
t2v) ;;
|
||||
i2v)
|
||||
if [[ -z "$first_frame" ]]; then
|
||||
echo "Error: --first-frame is required for i2v mode" >&2; exit 1
|
||||
fi
|
||||
local ff_url
|
||||
ff_url="$(resolve_image "$first_frame")"
|
||||
payload=$(echo "$payload" | jq --arg ff "$ff_url" '. + {first_frame_image: $ff}')
|
||||
[[ -n "$fast_pretreatment" ]] && payload=$(echo "$payload" | jq --argjson fp "$(echo "$fast_pretreatment" | tr '[:upper:]' '[:lower:]' | jq -R 'test("true")')" '. + {fast_pretreatment: $fp}')
|
||||
;;
|
||||
sef)
|
||||
if [[ -z "$first_frame" ]]; then
|
||||
echo "Error: --first-frame is required for sef mode" >&2; exit 1
|
||||
fi
|
||||
local ff_url
|
||||
ff_url="$(resolve_image "$first_frame")"
|
||||
payload=$(echo "$payload" | jq --arg ff "$ff_url" '. + {first_frame_image: $ff}')
|
||||
if [[ -n "$last_frame" ]]; then
|
||||
local lf_url
|
||||
lf_url="$(resolve_image "$last_frame")"
|
||||
payload=$(echo "$payload" | jq --arg lf "$lf_url" '. + {last_frame_image: $lf}')
|
||||
fi
|
||||
;;
|
||||
ref)
|
||||
if [[ -z "$subject_image" ]]; then
|
||||
echo "Error: --subject-image is required for ref mode" >&2; exit 1
|
||||
fi
|
||||
local si_url
|
||||
si_url="$(resolve_image "$subject_image")"
|
||||
payload=$(echo "$payload" | jq --arg si "$si_url" '. + {subject_reference: [{type: "character", image: [$si]}]}')
|
||||
if [[ -n "$first_frame" ]]; then
|
||||
local ff_url
|
||||
ff_url="$(resolve_image "$first_frame")"
|
||||
payload=$(echo "$payload" | jq --arg ff "$ff_url" '. + {first_frame_image: $ff}')
|
||||
fi
|
||||
;;
|
||||
*)
|
||||
echo "Error: Unknown mode: $mode" >&2; exit 1 ;;
|
||||
esac
|
||||
|
||||
echo "Mode: $mode"
|
||||
echo "Model: $model"
|
||||
|
||||
local task_id file_id
|
||||
task_id="$(create_task "$payload")"
|
||||
file_id="$(poll_task "$task_id")"
|
||||
download_video "$file_id" "$output"
|
||||
echo "Done!"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
Reference in New Issue
Block a user