Files
opencode-skill/skills/minimax-multimodal-toolkit/references/tts-guide.md
Kunthawat Greethong 7edf5bc4d0 feat: Import 35+ skills, merge duplicates, add openclaw installer
Major updates:
- Added 35+ new skills from awesome-opencode-skills and antigravity repos
- Merged SEO skills into seo-master
- Merged architecture skills into architecture
- Merged security skills into security-auditor and security-coder
- Merged testing skills into testing-master and testing-patterns
- Merged pentesting skills into pentesting
- Renamed website-creator to thai-frontend-dev
- Replaced skill-creator with github version
- Removed Chutes references (use MiniMax API instead)
- Added install-openclaw-skills.sh for cross-platform installation
- Updated .env.example with MiniMax API credentials
2026-03-26 11:37:39 +07:00

3.1 KiB
Raw Blame History

TTS Guide

Setup

cd skills/MiniMaxStudio
pip install -r requirements.txt
brew install ffmpeg   # macOS (or: sudo apt install ffmpeg)
export MINIMAX_API_KEY="your-api-key"   # sk-api-xxx or sk-cp-xxx
python scripts/check_environment.py

Quick Test

python scripts/tts/generate_voice.py tts "Hello, this is a test." -o test.mp3

Voice Management

List available voices:

python scripts/tts/generate_voice.py list-voices

Voice Cloning

Create a custom voice from an audio sample:

python scripts/tts/generate_voice.py clone audio.mp3 --voice-id my-custom-voice

# With preview
python scripts/tts/generate_voice.py clone audio.mp3 --voice-id my-voice --preview "Test text" --preview-output preview.mp3

Requirements: 10s5min duration, ≤20MB, mp3/wav/m4a format.

Voice Design

Design a voice from a text description:

python scripts/tts/generate_voice.py design "A warm, gentle female voice" --voice-id designed-voice

Custom voices expire after 7 days if not used with TTS.

Audio Processing

Merge

python scripts/tts/generate_voice.py merge file1.mp3 file2.mp3 -o combined.mp3
python scripts/tts/generate_voice.py merge a.mp3 b.mp3 -o merged.mp3 --crossfade 300

Convert

python scripts/tts/generate_voice.py convert input.wav -o output.mp3
python scripts/tts/generate_voice.py convert input.wav -o output.mp3 --format mp3 --bitrate 192k --sample-rate 32000

FFmpeg required. Supported formats: mp3, wav, flac, ogg, m4a, aac, wma, opus, pcm.

Segment-Based TTS

For multi-voice, multi-emotion workflows using a segments.json file:

# Validate
python scripts/tts/generate_voice.py validate segments.json --verbose

# Generate
python scripts/tts/generate_voice.py generate segments.json -o output.mp3 --crossfade 200

segments.json Format

[
  { "text": "Hello!", "voice_id": "female-shaonv", "emotion": "" },
  { "text": "How are you?", "voice_id": "male-qn-qingse", "emotion": "happy" }
]
  • text (required): Text to synthesize
  • voice_id (required): Voice ID
  • emotion (optional): For speech-2.8 models, leave empty for auto-matching. Valid values: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper

Troubleshooting

Error Solution
MINIMAX_API_KEY is required export MINIMAX_API_KEY="key"
FFmpeg not installed brew install ffmpeg
Voice not found python scripts/tts/generate_voice.py list-voices
401 Unauthorized Check API key validity
429 Too Many Requests Add delays between requests

API Details

  • Endpoint: POST /v1/t2a_v2
  • Base URL: https://api.minimaxi.com
  • Auth: Authorization: Bearer {MINIMAX_API_KEY}
  • Models: speech-2.8-hd (recommended), speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo
  • Text limit: 10,000 characters per request
  • Pause marker: <#x#> where x is seconds (0.0199.99)
  • Interjection tags (speech-2.8 only): (laughs), (chuckle), (coughs), (sighs), (breath), etc.