Auto-sync from website-creator

2026-03-08 23:03:19 +07:00
commit 9be686f587
117 changed files with 24737 additions and 0 deletions
--- a/skills/seo-analyzers/SKILL.md
+++ b/skills/seo-analyzers/SKILL.md
@@ -0,0 +1,424 @@
+---
+name: seo-analyzers
+description: Analyze content quality with Thai language support. Use for keyword density, readability scoring, SEO quality rating (0-100), and AI pattern detection.
+---
+
+# 🔍 SEO Analyzers - Thai Language Content Analysis
+
+**Skill Name:** `seo-analyzers`  
+**Category:** `quick`  
+**Load Skills:** `[]`
+
+---
+
+## 🚀 Purpose
+
+Analyze content quality with full Thai language support:
+
+- ✅ **Thai keyword density** - PyThaiNLP-based word counting
+- ✅ **Thai readability scoring** - Grade level, formality detection
+- ✅ **Content quality rating** - Overall 0-100 score
+- ✅ **AI pattern detection** - Remove AI watermarks (Thai-aware)
+- ✅ **Search intent analysis** - Classify Thai queries
+
+**Use Cases:**
+1. Analyze blog post quality before publishing
+2. Check keyword density for Thai content
+3. Score content quality (0-100)
+4. Remove AI patterns from generated content
+5. Analyze search intent for Thai keywords
+
+---
+
+## 📋 Pre-Flight Questions
+
+**MUST ask before analyzing:**
+
+1. **Content to Analyze:**
+   - Text content (paste directly)
+   - File path (Markdown, TXT)
+   - URL (fetch and analyze)
+
+2. **Analysis Type:** (Default: All)
+   - Keyword density
+   - Readability score
+   - Quality rating (0-100)
+   - AI pattern detection
+   - Search intent
+
+3. **Target Keyword:** (For keyword analysis)
+   - Primary keyword
+   - Secondary keywords (optional)
+
+4. **Content Language:** (Auto-detect or specify)
+   - Thai
+   - English
+   - Auto-detect
+
+---
+
+## 🔄 Workflows
+
+### **Workflow 1: Keyword Density Analysis**
+
+```python
+Input: Article text + target keyword
+Process:
+  1. Count Thai words (PyThaiNLP)
+  2. Calculate keyword density
+  3. Check critical placements (H1, first 100 words, conclusion)
+  4. Detect keyword stuffing
+Output:
+  - Word count
+  - Keyword occurrences
+  - Density percentage
+  - Status (too_low/optimal/too_high)
+  - Recommendations
+```
+
+### **Workflow 2: Readability Scoring**
+
+```python
+Input: Article text
+Process:
+  1. Count sentences (Thai-aware)
+  2. Calculate average sentence length
+  3. Detect formality level (Thai particles)
+  4. Estimate grade level
+Output:
+  - Avg sentence length
+  - Grade level (ม.6-ม.12 or 8-10)
+  - Formality score (กันเอง/ปกติ/เป็นทางการ)
+  - Readability recommendations
+```
+
+### **Workflow 3: Quality Rating (0-100)**
+
+```python
+Input: Article text + keyword
+Process:
+  1. Keyword optimization (25 points)
+  2. Readability (25 points)
+  3. Content structure (25 points)
+  4. Brand voice alignment (25 points)
+Output:
+  - Overall score (0-100)
+  - Category breakdowns
+  - Priority fixes
+  - Publishing readiness status
+```
+
+### **Workflow 4: AI Pattern Detection**
+
+```python
+Input: Generated content
+Process:
+  1. Remove Unicode watermarks (zero-width spaces)
+  2. Replace em-dashes with appropriate punctuation
+  3. Detect AI patterns (repetitive structures)
+  4. Thai-specific patterns (overly formal language)
+Output:
+  - Cleaned content
+  - Statistics (chars removed, patterns fixed)
+  - AI probability score
+```
+
+---
+
+## 🔧 Technical Implementation
+
+### **Thai Keyword Analyzer:**
+
+```python
+from pythainlp import word_tokenize
+from pythainlp.util import normalize
+
+def count_thai_words(text: str) -> int:
+    """Count Thai words accurately (no spaces between words)"""
+    tokens = word_tokenize(text, engine="newmm")
+    return len([t for t in tokens if t.strip() and not t.isspace()])
+
+def calculate_density(text: str, keyword: str) -> float:
+    """Calculate keyword density for Thai text"""
+    text_norm = normalize(text)
+    keyword_norm = normalize(keyword)
+    count = text_norm.count(keyword_norm)
+    word_count = count_thai_words(text)
+    return (count / word_count * 100) if word_count > 0 else 0
+
+def check_critical_placements(text: str, keyword: str) -> Dict:
+    """Check keyword in critical locations"""
+    return {
+        'in_first_100_words': keyword in text[:200],  # Thai chars are longer
+        'in_h1': check_h1(text, keyword),
+        'in_conclusion': keyword in text[-500:],
+        'density_status': get_density_status(calculate_density(text, keyword))
+    }
+```
+
+### **Thai Readability Scorer:**
+
+```python
+from pythainlp import sent_tokenize, word_tokenize
+
+def calculate_thai_readability(text: str) -> Dict:
+    """
+    Thai readability scoring (adapted for Thai language)
+    
+    Thai doesn't have spaces between words, so we use:
+    - Average sentence length (words per sentence)
+    - Presence of formal/informal particles
+    - Paragraph structure
+    """
+    sentences = sent_tokenize(text, engine="whitespace")
+    total_words = sum(len(word_tokenize(s, engine="newmm")) for s in sentences)
+    avg_sentence_length = total_words / len(sentences) if sentences else 0
+    
+    # Detect formality level
+    formality = detect_thai_formality(text)
+    
+    # Estimate grade level
+    if avg_sentence_length < 15:
+        grade_level = "ง่าย (ม.6-ม.9)"
+    elif avg_sentence_length < 25:
+        grade_level = "ปานกลาง (ม.10-ม.12)"
+    else:
+        grade_level = "ยาก (ม.13+)"
+    
+    return {
+        'avg_sentence_length': round(avg_sentence_length, 1),
+        'grade_level': grade_level,
+        'formality': formality,
+        'score': calculate_readability_score(avg_sentence_length, formality)
+    }
+
+def detect_thai_formality(text: str) -> str:
+    """
+    Detect Thai formality level from particles and word choice
+    """
+    formal_particles = ['ครับ', 'ค่ะ', 'ข้าพเจ้า', 'ท่าน', 'ซึ่ง', 'อัน']
+    informal_particles = ['นะ', 'จ้ะ', 'อ่ะ', 'มั้ย', 'gue', 'mang']
+    
+    formal_count = sum(text.count(p) for p in formal_particles)
+    informal_count = sum(text.count(p) for p in informal_particles)
+    
+    ratio = formal_count / (formal_count + informal_count) if (formal_count + informal_count) > 0 else 0.5
+    
+    if ratio > 0.6:
+        return "เป็นทางการ (Formal)"
+    elif ratio < 0.4:
+        return "กันเอง (Casual)"
+    else:
+        return "ปกติ (Normal)"
+```
+
+### **Content Quality Scorer:**
+
+```python
+def calculate_quality_score(text: str, keyword: str, brand_voice: Dict) -> Dict:
+    """
+    Calculate overall content quality score (0-100)
+    
+    Categories:
+    - Keyword Optimization: 25 points
+    - Readability: 25 points
+    - Content Structure: 25 points
+    - Brand Voice Alignment: 25 points
+    """
+    scores = {
+        'keyword_optimization': score_keyword_optimization(text, keyword),
+        'readability': score_readability(text),
+        'structure': score_structure(text),
+        'brand_voice': score_brand_voice(text, brand_voice)
+    }
+    
+    total = sum(scores.values())
+    
+    return {
+        'overall_score': round(total, 1),
+        'categories': scores,
+        'status': get_quality_status(total),
+        'recommendations': get_quality_recommendations(scores)
+    }
+
+def score_keyword_optimization(text: str, keyword: str) -> float:
+    """Score keyword optimization (0-25)"""
+    density = calculate_density(text, keyword)
+    placements = check_critical_placements(text, keyword)
+    
+    score = 0
+    
+    # Density score (10 points)
+    if 1.0 <= density <= 1.5:
+        score += 10
+    elif 0.5 <= density < 1.0 or 1.5 < density <= 2.0:
+        score += 5
+    
+    # Critical placements (15 points)
+    if placements['in_first_100_words']:
+        score += 5
+    if placements['in_h1']:
+        score += 5
+    if placements['in_conclusion']:
+        score += 5
+    
+    return score
+```
+
+---
+
+## 📁 Commands
+
+### **Analyze Keyword Density:**
+
+```bash
+python3 skills/seo-analyzers/scripts/thai_keyword_analyzer.py \
+  --text "บทความเกี่ยวกับบริการ podcast hosting..." \
+  --keyword "บริการ podcast" \
+  --language th
+```
+
+### **Score Content Quality:**
+
+```bash
+python3 skills/seo-analyzers/scripts/content_quality_scorer.py \
+  --file drafts/article.md \
+  --keyword "podcast hosting" \
+  --context "./website/context/"
+```
+
+### **Check Readability:**
+
+```bash
+python3 skills/seo-analyzers/scripts/thai_readability.py \
+  --text "เนื้อหาบทความภาษาไทย..." \
+  --language th
+```
+
+### **Clean AI Patterns:**
+
+```bash
+python3 skills/seo-analyzers/scripts/content_scrubber_thai.py \
+  --file drafts/ai-generated.md \
+  --output drafts/cleaned.md \
+  --verbose
+```
+
+---
+
+## ⚙️ Environment Variables
+
+**Optional (in unified .env):**
+
+```bash
+# No API keys required for seo-analyzers
+# All processing is local with PyThaiNLP
+
+# Optional: For advanced NLP
+NLTK_DATA_PATH=/path/to/nltk_data
+```
+
+---
+
+## 📊 Output Examples
+
+### **Keyword Analysis Output:**
+
+```json
+{
+  "word_count": 1847,
+  "keyword": "บริการ podcast",
+  "occurrences": 23,
+  "density": 1.25,
+  "status": "optimal",
+  "critical_placements": {
+    "in_first_100_words": true,
+    "in_h1": true,
+    "in_conclusion": true,
+    "in_h2_count": 3
+  },
+  "keyword_stuffing_risk": "none",
+  "recommendations": []
+}
+```
+
+### **Readability Output:**
+
+```json
+{
+  "avg_sentence_length": 18.5,
+  "grade_level": "ปานกลาง (ม.10-ม.12)",
+  "formality": "ปกติ (Normal)",
+  "score": 75,
+  "details": {
+    "sentence_count": 98,
+    "paragraph_count": 24,
+    "avg_paragraph_length": 4.1
+  },
+  "recommendations": [
+    "ลดความยาวประโยคบ้าง (บางประโยคยาวเกินไป)",
+    "รักษาระดับความเป็นกันเองนี้ไว้"
+  ]
+}
+```
+
+### **Quality Score Output:**
+
+```json
+{
+  "overall_score": 82.5,
+  "categories": {
+    "keyword_optimization": 22.5,
+    "readability": 20.0,
+    "structure": 23.0,
+    "brand_voice": 17.0
+  },
+  "status": "good",
+  "publishing_readiness": "Ready with minor tweaks",
+  "priority_fixes": [
+    "ปรับ brand voice ให้เป็นกันเองมากขึ้น",
+    "เพิ่ม internal links 2-3 แห่ง"
+  ],
+  "recommendations": [
+    "เพิ่มคำหลักใน H2 อีก 1-2 แห่ง",
+    "ย่อหน้าบางตอนยาวเกินไป แบ่งออกเป็น 2 ย่อหน้า"
+  ]
+}
+```
+
+---
+
+## ✅ Quality Thresholds
+
+| Score Range | Status | Action |
+|-------------|--------|--------|
+| 90-100 | Excellent | Publish immediately |
+| 80-89 | Good | Minor tweaks, publishable |
+| 70-79 | Fair | Address priority fixes |
+| Below 70 | Needs Work | Significant improvements required |
+
+---
+
+## ⚠️ Important Notes
+
+1. **Thai Word Counting:** Uses PyThaiNLP for accurate counting (no spaces between Thai words)
+
+2. **Formality Detection:** Auto-detects from particles (ครับ/ค่ะ vs นะ/จ้ะ)
+
+3. **Keyword Density:** Thai target is 1.0-1.5% (lower than English 1.5-2.0%)
+
+4. **Readability:** Thai grade levels (ม.6-ม.12) instead of Flesch scores
+
+5. **AI Patterns:** Thai-specific patterns (overly formal, repetitive structures)
+
+---
+
+## 🔄 Integration with Other Skills
+
+- **seo-multi-channel:** Calls for quality scoring before output
+- **seo-context:** Loads brand voice for alignment scoring
+- **website-creator:** Validates content before publishing
+
+---
+
+**Use this skill when you need to analyze content quality, check keyword density, or clean AI patterns from Thai or English content.**