Base code

This commit is contained in:
Kunthawat Greethong
2026-01-08 22:39:53 +07:00
parent 697115c61a
commit c35fa52117
2169 changed files with 626670 additions and 0 deletions

View File

@@ -0,0 +1,615 @@
# Persona System: Voice Cloning & Avatar Hyper-Personalization
## Executive Summary
This document outlines the integration of voice cloning and AI avatar capabilities into ALwrity's Persona System to enable true hyper-personalization. Users will train their voice and create their avatar during onboarding, then use these across all content generation (LinkedIn, Blog, Story Writer, etc.) for consistent brand identity.
---
## Vision: AI Hyper-Personalization
**Goal**: Every piece of content generated by ALwrity should feel authentically "you" - not just in writing style, but in voice and visual presence.
**Current State**: Persona system handles writing style only
**Target State**: Persona system handles writing style + voice + avatar = complete brand identity
---
## Current Persona System Analysis
### Existing Capabilities
- **Writing Style Analysis**: Tone, voice, complexity, engagement level
- **Platform Adaptation**: LinkedIn, Facebook, Blog optimizations
- **Content Characteristics**: Sentence structure, vocabulary, patterns
- **Onboarding Integration**: Automatically generated from onboarding data
### Current Limitations
- No voice/personality in audio content
- No visual representation
- Limited to text-based personalization
- Cannot create video content with user's presence
### Persona System Architecture
**Location**: `backend/services/persona_analysis_service.py`
**Current Flow**:
1. User completes onboarding (6 steps)
2. System analyzes website content and writing style
3. Core persona generated
4. Platform-specific adaptations created
5. Persona saved to database
**Database Model**: `backend/models/persona_models.py` - `WritingPersona` table
---
## Proposed Enhancements
### 1. Voice Cloning Integration
#### 1.1 Voice Training During Onboarding
**Integration Point**: Onboarding Step 6 (Persona Generation)
**New Onboarding Flow**:
```
Step 1-5: Existing onboarding steps
Step 6: Persona Generation
├─ Writing Style Analysis (existing)
├─ Voice Training (NEW)
│ ├─ Audio sample upload (1-3 minutes)
│ ├─ Voice clone training (~2-5 minutes)
│ └─ Voice preview and approval
└─ Avatar Creation (NEW)
├─ Photo upload
├─ Avatar generation
└─ Avatar preview and approval
```
**Implementation**:
**Backend**: `backend/services/persona/voice_persona_service.py` (NEW)
```python
class VoicePersonaService:
"""
Manages voice cloning for persona system.
Integrates with Minimax voice clone API.
"""
def train_voice_from_audio(
self,
user_id: str,
audio_file_path: str,
persona_id: int,
) -> Dict[str, Any]:
"""
Train voice clone from user's audio sample.
Links voice to persona.
"""
# 1. Validate audio file (format, length, quality)
# 2. Upload to Minimax
# 3. Train voice clone
# 4. Store voice_id in persona
# 5. Return training status
pass
def generate_audio_with_persona_voice(
self,
text: str,
persona_id: int,
emotion: str = "neutral",
speed: float = 1.0,
) -> bytes:
"""
Generate audio using persona's cloned voice.
"""
# 1. Get voice_id from persona
# 2. Call Minimax voice generation
# 3. Return audio bytes
pass
```
**Database Schema Update**: `backend/models/persona_models.py`
```python
class WritingPersona(Base):
# Existing fields...
# NEW: Voice cloning fields
voice_id: Optional[str] = Column(String(255), nullable=True)
voice_training_status: Optional[str] = Column(String(50), nullable=True) # 'not_trained', 'training', 'ready', 'failed'
voice_training_audio_url: Optional[str] = Column(String(500), nullable=True)
voice_trained_at: Optional[datetime] = Column(DateTime, nullable=True)
# NEW: Avatar fields
avatar_id: Optional[str] = Column(String(255), nullable=True)
avatar_image_url: Optional[str] = Column(String(500), nullable=True)
avatar_training_status: Optional[str] = Column(String(50), nullable=True)
avatar_created_at: Optional[datetime] = Column(DateTime, nullable=True)
```
**Frontend**: `frontend/src/components/Onboarding/PersonaGenerationStep.tsx` (NEW)
```typescript
interface PersonaGenerationStepProps {
onboardingData: OnboardingData;
onComplete: (persona: Persona) => void;
}
const PersonaGenerationStep: React.FC<PersonaGenerationStepProps> = ({
onboardingData,
onComplete,
}) => {
// 1. Show writing style analysis progress
// 2. Show voice training section
// 3. Show avatar creation section
// 4. Preview complete persona
// 5. Allow approval/modification
};
```
#### 1.2 Voice Usage Across Platform
**Integration Points**:
- **Story Writer**: Use persona voice for audio narration
- **LinkedIn**: Voice-over for video posts
- **Blog**: Audio narration for blog posts
- **Email**: Personalized voice messages
- **Social Media**: Video content with user's voice
**Implementation Pattern**:
```python
# In any content generation service
def generate_content_with_persona(user_id: str, content_type: str):
# 1. Get user's persona
persona = get_persona(user_id)
# 2. Generate text content (existing)
text_content = generate_text(persona)
# 3. Generate audio with persona voice (NEW)
if persona.voice_id and persona.voice_training_status == 'ready':
audio_content = voice_service.generate_audio_with_persona_voice(
text=text_content,
persona_id=persona.id,
)
# 4. Generate video with persona avatar (NEW)
if persona.avatar_id:
video_content = avatar_service.generate_video_with_persona_avatar(
text=text_content,
audio=audio_content,
persona_id=persona.id,
)
return {
'text': text_content,
'audio': audio_content,
'video': video_content,
}
```
---
### 2. Avatar Creation Integration
#### 2.1 Avatar Training During Onboarding
**Integration Point**: Onboarding Step 6 (Persona Generation)
**Avatar Options**:
1. **Hunyuan Avatar**: Talking avatar from photo + audio
2. **InfiniteTalk**: Long-form avatar videos
3. **Custom Avatar**: User's photo as avatar base
**Implementation**:
**Backend**: `backend/services/persona/avatar_persona_service.py` (NEW)
```python
class AvatarPersonaService:
"""
Manages avatar creation for persona system.
Integrates with WaveSpeed Hunyuan Avatar and InfiniteTalk.
"""
def create_avatar_from_photo(
self,
user_id: str,
photo_file_path: str,
persona_id: int,
) -> Dict[str, Any]:
"""
Create avatar from user's photo.
Uses Hunyuan Avatar for initial creation.
"""
# 1. Validate photo (format, size, quality)
# 2. Upload to WaveSpeed
# 3. Create avatar
# 4. Store avatar_id in persona
# 5. Return avatar preview
pass
def generate_video_with_persona_avatar(
self,
text: str,
audio_bytes: bytes,
persona_id: int,
duration: int = 60, # seconds
) -> bytes:
"""
Generate video with persona's avatar speaking.
Uses InfiniteTalk for long-form, Hunyuan for short.
"""
# 1. Get avatar_id from persona
# 2. Get voice_id from persona (for audio)
# 3. Call WaveSpeed API
# 4. Return video bytes
pass
```
#### 2.2 Avatar Usage Across Platform
**Use Cases**:
- **LinkedIn Video Posts**: User's avatar presenting content
- **Story Writer**: Avatar narrating story scenes
- **Blog Videos**: Avatar explaining blog content
- **Email Campaigns**: Personalized video messages
- **Social Media**: Consistent avatar across platforms
---
### 3. Enhanced Persona Management
#### 3.1 Persona Dashboard
**New UI Component**: `frontend/src/components/Persona/PersonaDashboard.tsx`
**Features**:
- Persona overview (writing style, voice, avatar)
- Voice training status and preview
- Avatar preview and management
- Usage statistics (where persona is used)
- Edit/update options
#### 3.2 Persona Settings
**New UI Component**: `frontend/src/components/Persona/PersonaSettings.tsx`
**Settings**:
- Voice parameters (emotion, speed, tone)
- Avatar appearance (clothing, background, style)
- Platform-specific adaptations
- Content type preferences
---
## Implementation Phases
### Phase 1: Voice Cloning Integration (Week 1-3)
**Priority**: HIGH - Core hyper-personalization feature
**Tasks**:
1. ✅ Create `VoicePersonaService`
2. ✅ Integrate Minimax voice clone API
3. ✅ Add voice fields to `WritingPersona` model
4. ✅ Update onboarding Step 6 with voice training
5. ✅ Create voice training UI component
6. ✅ Add voice preview and testing
7. ✅ Integrate voice into Story Writer
8. ✅ Add voice usage tracking
9. ✅ Update persona dashboard
10. ✅ Testing and optimization
**Files to Create**:
- `backend/services/persona/voice_persona_service.py`
- `frontend/src/components/Onboarding/VoiceTrainingSection.tsx`
- `frontend/src/components/Persona/VoiceManagement.tsx`
**Files to Modify**:
- `backend/models/persona_models.py`
- `backend/services/persona_analysis_service.py`
- `backend/api/onboarding_utils/` (onboarding routes)
- `frontend/src/components/Onboarding/PersonaGenerationStep.tsx`
- `backend/services/story_writer/audio_generation_service.py`
**Success Criteria**:
- Users can train voice during onboarding
- Voice used automatically in Story Writer
- Voice quality significantly better than gTTS
- Voice linked to persona
- Cost tracking accurate
---
### Phase 2: Avatar Creation Integration (Week 4-6)
**Priority**: HIGH - Visual personalization
**Tasks**:
1. ✅ Create `AvatarPersonaService`
2. ✅ Integrate Hunyuan Avatar API
3. ✅ Add avatar fields to `WritingPersona` model
4. ✅ Update onboarding Step 6 with avatar creation
5. ✅ Create avatar creation UI component
6. ✅ Add avatar preview and testing
7. ✅ Integrate avatar into content generation
8. ✅ Add avatar usage tracking
9. ✅ Update persona dashboard
10. ✅ Testing and optimization
**Files to Create**:
- `backend/services/persona/avatar_persona_service.py`
- `frontend/src/components/Onboarding/AvatarCreationSection.tsx`
- `frontend/src/components/Persona/AvatarManagement.tsx`
**Files to Modify**:
- `backend/models/persona_models.py`
- `backend/services/persona_analysis_service.py`
- `frontend/src/components/Onboarding/PersonaGenerationStep.tsx`
- `backend/services/story_writer/video_generation_service.py`
**Success Criteria**:
- Users can create avatar during onboarding
- Avatar used in video content generation
- Avatar quality good
- Avatar linked to persona
- Cost tracking accurate
---
### Phase 3: Cross-Platform Integration (Week 7-8)
**Priority**: MEDIUM - Complete hyper-personalization
**Tasks**:
1. ✅ Integrate persona voice into LinkedIn Writer
2. ✅ Integrate persona avatar into LinkedIn Writer
3. ✅ Integrate persona voice into Blog Writer
4. ✅ Integrate persona avatar into Blog Writer
5. ✅ Add persona usage analytics
6. ✅ Update all content generation services
7. ✅ Create persona usage dashboard
8. ✅ Documentation and user guides
**Success Criteria**:
- Persona voice/avatar used across all platforms
- Consistent brand identity
- Good user experience
- Analytics working
---
## Cost Management
### Voice Cloning Costs
**One-Time Training**: $0.75 per voice
**Per-Minute Generation**: $0.02 per minute
**Cost Optimization**:
- Train voice once during onboarding (included in Pro/Enterprise)
- Free tier: gTTS only
- Basic tier: Voice training available ($0.75 one-time)
- Pro/Enterprise: Voice training included
### Avatar Creation Costs
**Hunyuan Avatar**: $0.15-0.30 per 5 seconds
**InfiniteTalk**: $0.15-0.30 per 5 seconds (up to 10 minutes)
**Cost Optimization**:
- Avatar creation: One-time during onboarding
- Video generation: Pay-per-use
- Default to shorter videos (5 seconds)
- Allow longer videos for premium users
### Subscription Integration
**Update Subscription Tiers**:
- **Free**: Writing persona only, no voice/avatar
- **Basic**: Writing persona + voice training ($0.75 one-time)
- **Pro**: Writing persona + voice + avatar creation included
- **Enterprise**: All features + unlimited usage
---
## User Experience Flow
### Onboarding Flow (Enhanced)
```
Step 1-5: Existing onboarding steps
Step 6: Persona Generation
├─ Writing Style Analysis
│ └─ [Progress: Analyzing your writing style...]
├─ Voice Training (NEW)
│ ├─ Upload audio sample (1-3 minutes)
│ ├─ [Training your voice...] (~2-5 minutes)
│ ├─ Preview generated voice
│ └─ Approve or retrain
└─ Avatar Creation (NEW)
├─ Upload photo
├─ [Creating your avatar...] (~1-2 minutes)
├─ Preview avatar
└─ Approve or recreate
Step 7: Persona Preview
├─ Writing Style Summary
├─ Voice Preview
├─ Avatar Preview
└─ Approve Complete Persona
```
### Content Generation Flow (Enhanced)
```
User creates content (LinkedIn/Blog/Story)
System loads user's persona
├─ Writing style → Text generation
├─ Voice ID → Audio generation (if available)
└─ Avatar ID → Video generation (if available)
Content generated with full personalization
├─ Text matches writing style
├─ Audio uses user's voice
└─ Video shows user's avatar
```
---
## Technical Architecture
### Backend Services
```
backend/services/
├── persona/
│ ├── __init__.py
│ ├── voice_persona_service.py # NEW: Voice cloning
│ ├── avatar_persona_service.py # NEW: Avatar creation
│ └── persona_analysis_service.py # Enhanced
├── minimax/
│ └── voice_clone.py # Shared with Story Writer
└── wavespeed/
└── avatar_generation.py # Shared with Story Writer
```
### Frontend Components
```
frontend/src/components/
├── Onboarding/
│ ├── PersonaGenerationStep.tsx # Enhanced
│ ├── VoiceTrainingSection.tsx # NEW
│ └── AvatarCreationSection.tsx # NEW
└── Persona/
├── PersonaDashboard.tsx # NEW
├── VoiceManagement.tsx # NEW
├── AvatarManagement.tsx # NEW
└── PersonaSettings.tsx # NEW
```
### Database Schema
```sql
-- Enhanced WritingPersona table
ALTER TABLE writing_persona ADD COLUMN voice_id VARCHAR(255);
ALTER TABLE writing_persona ADD COLUMN voice_training_status VARCHAR(50);
ALTER TABLE writing_persona ADD COLUMN voice_training_audio_url VARCHAR(500);
ALTER TABLE writing_persona ADD COLUMN voice_trained_at TIMESTAMP;
ALTER TABLE writing_persona ADD COLUMN avatar_id VARCHAR(255);
ALTER TABLE writing_persona ADD COLUMN avatar_image_url VARCHAR(500);
ALTER TABLE writing_persona ADD COLUMN avatar_training_status VARCHAR(50);
ALTER TABLE writing_persona ADD COLUMN avatar_created_at TIMESTAMP;
```
---
## Integration with Existing Systems
### Story Writer Integration
**Location**: `backend/services/story_writer/audio_generation_service.py`
**Enhancement**:
```python
def generate_scene_audio(
self,
scene: Dict[str, Any],
user_id: str,
use_persona_voice: bool = True, # NEW: Use persona voice
) -> Dict[str, Any]:
if use_persona_voice:
# Get user's persona
persona = get_persona(user_id)
if persona.voice_id and persona.voice_training_status == 'ready':
# Use persona voice
return self._generate_with_persona_voice(scene, persona)
# Fallback to default provider
return self._generate_with_gtts(scene)
```
### LinkedIn Writer Integration
**Enhancement**: Add video generation with persona avatar
- LinkedIn video posts with user's avatar
- Voice-over with user's voice
- Consistent brand presence
### Blog Writer Integration
**Enhancement**: Add audio/video options
- Audio narration with persona voice
- Video explanations with persona avatar
- Enhanced blog content
---
## Success Metrics
### Adoption Metrics
- Voice training completion rate (target: >60% of Pro users)
- Avatar creation completion rate (target: >50% of Pro users)
- Persona usage across platforms (target: >80% of content uses persona)
### Quality Metrics
- Voice quality satisfaction (target: >4.5/5)
- Avatar quality satisfaction (target: >4.5/5)
- Brand consistency score (target: >90%)
### Business Metrics
- User retention (persona users vs. non-persona)
- Content engagement (persona content vs. generic)
- Premium tier conversion (persona as differentiator)
---
## Risk Mitigation
| Risk | Mitigation |
|------|------------|
| Voice training failure | Quality checks, clear error messages, retry option |
| Avatar quality issues | Preview before approval, regeneration option |
| Cost concerns | Clear pricing, tier-based access, cost estimates |
| User privacy | Secure storage, opt-in consent, data encryption |
| API reliability | Fallback options, retry logic, error handling |
---
## Privacy & Security
### Data Storage
- Voice samples: Encrypted storage, deleted after training
- Avatar photos: Encrypted storage, user can delete
- Voice/Avatar IDs: Secure API keys, no raw data stored
### User Control
- Users can delete voice/avatar anytime
- Users can retrain voice/avatar
- Users can opt-out of voice/avatar features
- Clear privacy policy
---
## Next Steps
1. **Week 1**: Set up Minimax API access
2. **Week 1-2**: Implement voice persona service
3. **Week 2-3**: Integrate into onboarding
4. **Week 3-4**: Integrate into Story Writer
5. **Week 4-5**: Set up WaveSpeed avatar API
6. **Week 5-6**: Implement avatar persona service
7. **Week 6-7**: Integrate into onboarding
8. **Week 7-8**: Cross-platform integration
---
*Document Version: 1.0*
*Last Updated: January 2025*
*Priority: HIGH - Core Hyper-Personalization Feature*