--- name: seo-analyzers description: Analyze content quality with Thai language support. Use for keyword density, readability scoring, SEO quality rating (0-100), and AI pattern detection. --- # 🔍 SEO Analyzers - Thai Language Content Analysis **Skill Name:** `seo-analyzers` **Category:** `quick` **Load Skills:** `[]` --- ## 🚀 Purpose Analyze content quality with full Thai language support: - ✅ **Thai keyword density** - PyThaiNLP-based word counting - ✅ **Thai readability scoring** - Grade level, formality detection - ✅ **Content quality rating** - Overall 0-100 score - ✅ **AI pattern detection** - Remove AI watermarks (Thai-aware) - ✅ **Search intent analysis** - Classify Thai queries **Use Cases:** 1. Analyze blog post quality before publishing 2. Check keyword density for Thai content 3. Score content quality (0-100) 4. Remove AI patterns from generated content 5. Analyze search intent for Thai keywords --- ## 📋 Pre-Flight Questions **MUST ask before analyzing:** 1. **Content to Analyze:** - Text content (paste directly) - File path (Markdown, TXT) - URL (fetch and analyze) 2. **Analysis Type:** (Default: All) - Keyword density - Readability score - Quality rating (0-100) - AI pattern detection - Search intent 3. **Target Keyword:** (For keyword analysis) - Primary keyword - Secondary keywords (optional) 4. **Content Language:** (Auto-detect or specify) - Thai - English - Auto-detect --- ## 🔄 Workflows ### **Workflow 1: Keyword Density Analysis** ```python Input: Article text + target keyword Process: 1. Count Thai words (PyThaiNLP) 2. Calculate keyword density 3. Check critical placements (H1, first 100 words, conclusion) 4. Detect keyword stuffing Output: - Word count - Keyword occurrences - Density percentage - Status (too_low/optimal/too_high) - Recommendations ``` ### **Workflow 2: Readability Scoring** ```python Input: Article text Process: 1. Count sentences (Thai-aware) 2. Calculate average sentence length 3. Detect formality level (Thai particles) 4. Estimate grade level Output: - Avg sentence length - Grade level (ม.6-ม.12 or 8-10) - Formality score (กันเอง/ปกติ/เป็นทางการ) - Readability recommendations ``` ### **Workflow 3: Quality Rating (0-100)** ```python Input: Article text + keyword Process: 1. Keyword optimization (25 points) 2. Readability (25 points) 3. Content structure (25 points) 4. Brand voice alignment (25 points) Output: - Overall score (0-100) - Category breakdowns - Priority fixes - Publishing readiness status ``` ### **Workflow 4: AI Pattern Detection** ```python Input: Generated content Process: 1. Remove Unicode watermarks (zero-width spaces) 2. Replace em-dashes with appropriate punctuation 3. Detect AI patterns (repetitive structures) 4. Thai-specific patterns (overly formal language) Output: - Cleaned content - Statistics (chars removed, patterns fixed) - AI probability score ``` --- ## 🔧 Technical Implementation ### **Thai Keyword Analyzer:** ```python from pythainlp import word_tokenize from pythainlp.util import normalize def count_thai_words(text: str) -> int: """Count Thai words accurately (no spaces between words)""" tokens = word_tokenize(text, engine="newmm") return len([t for t in tokens if t.strip() and not t.isspace()]) def calculate_density(text: str, keyword: str) -> float: """Calculate keyword density for Thai text""" text_norm = normalize(text) keyword_norm = normalize(keyword) count = text_norm.count(keyword_norm) word_count = count_thai_words(text) return (count / word_count * 100) if word_count > 0 else 0 def check_critical_placements(text: str, keyword: str) -> Dict: """Check keyword in critical locations""" return { 'in_first_100_words': keyword in text[:200], # Thai chars are longer 'in_h1': check_h1(text, keyword), 'in_conclusion': keyword in text[-500:], 'density_status': get_density_status(calculate_density(text, keyword)) } ``` ### **Thai Readability Scorer:** ```python from pythainlp import sent_tokenize, word_tokenize def calculate_thai_readability(text: str) -> Dict: """ Thai readability scoring (adapted for Thai language) Thai doesn't have spaces between words, so we use: - Average sentence length (words per sentence) - Presence of formal/informal particles - Paragraph structure """ sentences = sent_tokenize(text, engine="whitespace") total_words = sum(len(word_tokenize(s, engine="newmm")) for s in sentences) avg_sentence_length = total_words / len(sentences) if sentences else 0 # Detect formality level formality = detect_thai_formality(text) # Estimate grade level if avg_sentence_length < 15: grade_level = "ง่าย (ม.6-ม.9)" elif avg_sentence_length < 25: grade_level = "ปานกลาง (ม.10-ม.12)" else: grade_level = "ยาก (ม.13+)" return { 'avg_sentence_length': round(avg_sentence_length, 1), 'grade_level': grade_level, 'formality': formality, 'score': calculate_readability_score(avg_sentence_length, formality) } def detect_thai_formality(text: str) -> str: """ Detect Thai formality level from particles and word choice """ formal_particles = ['ครับ', 'ค่ะ', 'ข้าพเจ้า', 'ท่าน', 'ซึ่ง', 'อัน'] informal_particles = ['นะ', 'จ้ะ', 'อ่ะ', 'มั้ย', 'gue', 'mang'] formal_count = sum(text.count(p) for p in formal_particles) informal_count = sum(text.count(p) for p in informal_particles) ratio = formal_count / (formal_count + informal_count) if (formal_count + informal_count) > 0 else 0.5 if ratio > 0.6: return "เป็นทางการ (Formal)" elif ratio < 0.4: return "กันเอง (Casual)" else: return "ปกติ (Normal)" ``` ### **Content Quality Scorer:** ```python def calculate_quality_score(text: str, keyword: str, brand_voice: Dict) -> Dict: """ Calculate overall content quality score (0-100) Categories: - Keyword Optimization: 25 points - Readability: 25 points - Content Structure: 25 points - Brand Voice Alignment: 25 points """ scores = { 'keyword_optimization': score_keyword_optimization(text, keyword), 'readability': score_readability(text), 'structure': score_structure(text), 'brand_voice': score_brand_voice(text, brand_voice) } total = sum(scores.values()) return { 'overall_score': round(total, 1), 'categories': scores, 'status': get_quality_status(total), 'recommendations': get_quality_recommendations(scores) } def score_keyword_optimization(text: str, keyword: str) -> float: """Score keyword optimization (0-25)""" density = calculate_density(text, keyword) placements = check_critical_placements(text, keyword) score = 0 # Density score (10 points) if 1.0 <= density <= 1.5: score += 10 elif 0.5 <= density < 1.0 or 1.5 < density <= 2.0: score += 5 # Critical placements (15 points) if placements['in_first_100_words']: score += 5 if placements['in_h1']: score += 5 if placements['in_conclusion']: score += 5 return score ``` --- ## 📁 Commands ### **Analyze Keyword Density:** ```bash python3 skills/seo-analyzers/scripts/thai_keyword_analyzer.py \ --text "บทความเกี่ยวกับบริการ podcast hosting..." \ --keyword "บริการ podcast" \ --language th ``` ### **Score Content Quality:** ```bash python3 skills/seo-analyzers/scripts/content_quality_scorer.py \ --file drafts/article.md \ --keyword "podcast hosting" \ --context "./website/context/" ``` ### **Check Readability:** ```bash python3 skills/seo-analyzers/scripts/thai_readability.py \ --text "เนื้อหาบทความภาษาไทย..." \ --language th ``` ### **Clean AI Patterns:** ```bash python3 skills/seo-analyzers/scripts/content_scrubber_thai.py \ --file drafts/ai-generated.md \ --output drafts/cleaned.md \ --verbose ``` --- ## ⚙️ Environment Variables **Optional (in unified .env):** ```bash # No API keys required for seo-analyzers # All processing is local with PyThaiNLP # Optional: For advanced NLP NLTK_DATA_PATH=/path/to/nltk_data ``` --- ## 📊 Output Examples ### **Keyword Analysis Output:** ```json { "word_count": 1847, "keyword": "บริการ podcast", "occurrences": 23, "density": 1.25, "status": "optimal", "critical_placements": { "in_first_100_words": true, "in_h1": true, "in_conclusion": true, "in_h2_count": 3 }, "keyword_stuffing_risk": "none", "recommendations": [] } ``` ### **Readability Output:** ```json { "avg_sentence_length": 18.5, "grade_level": "ปานกลาง (ม.10-ม.12)", "formality": "ปกติ (Normal)", "score": 75, "details": { "sentence_count": 98, "paragraph_count": 24, "avg_paragraph_length": 4.1 }, "recommendations": [ "ลดความยาวประโยคบ้าง (บางประโยคยาวเกินไป)", "รักษาระดับความเป็นกันเองนี้ไว้" ] } ``` ### **Quality Score Output:** ```json { "overall_score": 82.5, "categories": { "keyword_optimization": 22.5, "readability": 20.0, "structure": 23.0, "brand_voice": 17.0 }, "status": "good", "publishing_readiness": "Ready with minor tweaks", "priority_fixes": [ "ปรับ brand voice ให้เป็นกันเองมากขึ้น", "เพิ่ม internal links 2-3 แห่ง" ], "recommendations": [ "เพิ่มคำหลักใน H2 อีก 1-2 แห่ง", "ย่อหน้าบางตอนยาวเกินไป แบ่งออกเป็น 2 ย่อหน้า" ] } ``` --- ## ✅ Quality Thresholds | Score Range | Status | Action | |-------------|--------|--------| | 90-100 | Excellent | Publish immediately | | 80-89 | Good | Minor tweaks, publishable | | 70-79 | Fair | Address priority fixes | | Below 70 | Needs Work | Significant improvements required | --- ## ⚠️ Important Notes 1. **Thai Word Counting:** Uses PyThaiNLP for accurate counting (no spaces between Thai words) 2. **Formality Detection:** Auto-detects from particles (ครับ/ค่ะ vs นะ/จ้ะ) 3. **Keyword Density:** Thai target is 1.0-1.5% (lower than English 1.5-2.0%) 4. **Readability:** Thai grade levels (ม.6-ม.12) instead of Flesch scores 5. **AI Patterns:** Thai-specific patterns (overly formal, repetitive structures) --- ## 🔄 Integration with Other Skills - **seo-multi-channel:** Calls for quality scoring before output - **seo-context:** Loads brand voice for alignment scoring - **website-creator:** Validates content before publishing --- **Use this skill when you need to analyze content quality, check keyword density, or clean AI patterns from Thai or English content.**