Files
2026-03-08 23:03:19 +07:00

11 KiB

name, description
name description
seo-analyzers Analyze content quality with Thai language support. Use for keyword density, readability scoring, SEO quality rating (0-100), and AI pattern detection.

🔍 SEO Analyzers - Thai Language Content Analysis

Skill Name: seo-analyzers
Category: quick
Load Skills: []


🚀 Purpose

Analyze content quality with full Thai language support:

  • Thai keyword density - PyThaiNLP-based word counting
  • Thai readability scoring - Grade level, formality detection
  • Content quality rating - Overall 0-100 score
  • AI pattern detection - Remove AI watermarks (Thai-aware)
  • Search intent analysis - Classify Thai queries

Use Cases:

  1. Analyze blog post quality before publishing
  2. Check keyword density for Thai content
  3. Score content quality (0-100)
  4. Remove AI patterns from generated content
  5. Analyze search intent for Thai keywords

📋 Pre-Flight Questions

MUST ask before analyzing:

  1. Content to Analyze:

    • Text content (paste directly)
    • File path (Markdown, TXT)
    • URL (fetch and analyze)
  2. Analysis Type: (Default: All)

    • Keyword density
    • Readability score
    • Quality rating (0-100)
    • AI pattern detection
    • Search intent
  3. Target Keyword: (For keyword analysis)

    • Primary keyword
    • Secondary keywords (optional)
  4. Content Language: (Auto-detect or specify)

    • Thai
    • English
    • Auto-detect

🔄 Workflows

Workflow 1: Keyword Density Analysis

Input: Article text + target keyword
Process:
  1. Count Thai words (PyThaiNLP)
  2. Calculate keyword density
  3. Check critical placements (H1, first 100 words, conclusion)
  4. Detect keyword stuffing
Output:
  - Word count
  - Keyword occurrences
  - Density percentage
  - Status (too_low/optimal/too_high)
  - Recommendations

Workflow 2: Readability Scoring

Input: Article text
Process:
  1. Count sentences (Thai-aware)
  2. Calculate average sentence length
  3. Detect formality level (Thai particles)
  4. Estimate grade level
Output:
  - Avg sentence length
  - Grade level (.6-.12 or 8-10)
  - Formality score (นเอง/ปกต/เปนทางการ)
  - Readability recommendations

Workflow 3: Quality Rating (0-100)

Input: Article text + keyword
Process:
  1. Keyword optimization (25 points)
  2. Readability (25 points)
  3. Content structure (25 points)
  4. Brand voice alignment (25 points)
Output:
  - Overall score (0-100)
  - Category breakdowns
  - Priority fixes
  - Publishing readiness status

Workflow 4: AI Pattern Detection

Input: Generated content
Process:
  1. Remove Unicode watermarks (zero-width spaces)
  2. Replace em-dashes with appropriate punctuation
  3. Detect AI patterns (repetitive structures)
  4. Thai-specific patterns (overly formal language)
Output:
  - Cleaned content
  - Statistics (chars removed, patterns fixed)
  - AI probability score

🔧 Technical Implementation

Thai Keyword Analyzer:

from pythainlp import word_tokenize
from pythainlp.util import normalize

def count_thai_words(text: str) -> int:
    """Count Thai words accurately (no spaces between words)"""
    tokens = word_tokenize(text, engine="newmm")
    return len([t for t in tokens if t.strip() and not t.isspace()])

def calculate_density(text: str, keyword: str) -> float:
    """Calculate keyword density for Thai text"""
    text_norm = normalize(text)
    keyword_norm = normalize(keyword)
    count = text_norm.count(keyword_norm)
    word_count = count_thai_words(text)
    return (count / word_count * 100) if word_count > 0 else 0

def check_critical_placements(text: str, keyword: str) -> Dict:
    """Check keyword in critical locations"""
    return {
        'in_first_100_words': keyword in text[:200],  # Thai chars are longer
        'in_h1': check_h1(text, keyword),
        'in_conclusion': keyword in text[-500:],
        'density_status': get_density_status(calculate_density(text, keyword))
    }

Thai Readability Scorer:

from pythainlp import sent_tokenize, word_tokenize

def calculate_thai_readability(text: str) -> Dict:
    """
    Thai readability scoring (adapted for Thai language)
    
    Thai doesn't have spaces between words, so we use:
    - Average sentence length (words per sentence)
    - Presence of formal/informal particles
    - Paragraph structure
    """
    sentences = sent_tokenize(text, engine="whitespace")
    total_words = sum(len(word_tokenize(s, engine="newmm")) for s in sentences)
    avg_sentence_length = total_words / len(sentences) if sentences else 0
    
    # Detect formality level
    formality = detect_thai_formality(text)
    
    # Estimate grade level
    if avg_sentence_length < 15:
        grade_level = "ง่าย (ม.6-ม.9)"
    elif avg_sentence_length < 25:
        grade_level = "ปานกลาง (ม.10-ม.12)"
    else:
        grade_level = "ยาก (ม.13+)"
    
    return {
        'avg_sentence_length': round(avg_sentence_length, 1),
        'grade_level': grade_level,
        'formality': formality,
        'score': calculate_readability_score(avg_sentence_length, formality)
    }

def detect_thai_formality(text: str) -> str:
    """
    Detect Thai formality level from particles and word choice
    """
    formal_particles = ['ครับ', 'ค่ะ', 'ข้าพเจ้า', 'ท่าน', 'ซึ่ง', 'อัน']
    informal_particles = ['นะ', 'จ้ะ', 'อ่ะ', 'มั้ย', 'gue', 'mang']
    
    formal_count = sum(text.count(p) for p in formal_particles)
    informal_count = sum(text.count(p) for p in informal_particles)
    
    ratio = formal_count / (formal_count + informal_count) if (formal_count + informal_count) > 0 else 0.5
    
    if ratio > 0.6:
        return "เป็นทางการ (Formal)"
    elif ratio < 0.4:
        return "กันเอง (Casual)"
    else:
        return "ปกติ (Normal)"

Content Quality Scorer:

def calculate_quality_score(text: str, keyword: str, brand_voice: Dict) -> Dict:
    """
    Calculate overall content quality score (0-100)
    
    Categories:
    - Keyword Optimization: 25 points
    - Readability: 25 points
    - Content Structure: 25 points
    - Brand Voice Alignment: 25 points
    """
    scores = {
        'keyword_optimization': score_keyword_optimization(text, keyword),
        'readability': score_readability(text),
        'structure': score_structure(text),
        'brand_voice': score_brand_voice(text, brand_voice)
    }
    
    total = sum(scores.values())
    
    return {
        'overall_score': round(total, 1),
        'categories': scores,
        'status': get_quality_status(total),
        'recommendations': get_quality_recommendations(scores)
    }

def score_keyword_optimization(text: str, keyword: str) -> float:
    """Score keyword optimization (0-25)"""
    density = calculate_density(text, keyword)
    placements = check_critical_placements(text, keyword)
    
    score = 0
    
    # Density score (10 points)
    if 1.0 <= density <= 1.5:
        score += 10
    elif 0.5 <= density < 1.0 or 1.5 < density <= 2.0:
        score += 5
    
    # Critical placements (15 points)
    if placements['in_first_100_words']:
        score += 5
    if placements['in_h1']:
        score += 5
    if placements['in_conclusion']:
        score += 5
    
    return score

📁 Commands

Analyze Keyword Density:

python3 skills/seo-analyzers/scripts/thai_keyword_analyzer.py \
  --text "บทความเกี่ยวกับบริการ podcast hosting..." \
  --keyword "บริการ podcast" \
  --language th

Score Content Quality:

python3 skills/seo-analyzers/scripts/content_quality_scorer.py \
  --file drafts/article.md \
  --keyword "podcast hosting" \
  --context "./website/context/"

Check Readability:

python3 skills/seo-analyzers/scripts/thai_readability.py \
  --text "เนื้อหาบทความภาษาไทย..." \
  --language th

Clean AI Patterns:

python3 skills/seo-analyzers/scripts/content_scrubber_thai.py \
  --file drafts/ai-generated.md \
  --output drafts/cleaned.md \
  --verbose

⚙️ Environment Variables

Optional (in unified .env):

# No API keys required for seo-analyzers
# All processing is local with PyThaiNLP

# Optional: For advanced NLP
NLTK_DATA_PATH=/path/to/nltk_data

📊 Output Examples

Keyword Analysis Output:

{
  "word_count": 1847,
  "keyword": "บริการ podcast",
  "occurrences": 23,
  "density": 1.25,
  "status": "optimal",
  "critical_placements": {
    "in_first_100_words": true,
    "in_h1": true,
    "in_conclusion": true,
    "in_h2_count": 3
  },
  "keyword_stuffing_risk": "none",
  "recommendations": []
}

Readability Output:

{
  "avg_sentence_length": 18.5,
  "grade_level": "ปานกลาง (ม.10-ม.12)",
  "formality": "ปกติ (Normal)",
  "score": 75,
  "details": {
    "sentence_count": 98,
    "paragraph_count": 24,
    "avg_paragraph_length": 4.1
  },
  "recommendations": [
    "ลดความยาวประโยคบ้าง (บางประโยคยาวเกินไป)",
    "รักษาระดับความเป็นกันเองนี้ไว้"
  ]
}

Quality Score Output:

{
  "overall_score": 82.5,
  "categories": {
    "keyword_optimization": 22.5,
    "readability": 20.0,
    "structure": 23.0,
    "brand_voice": 17.0
  },
  "status": "good",
  "publishing_readiness": "Ready with minor tweaks",
  "priority_fixes": [
    "ปรับ brand voice ให้เป็นกันเองมากขึ้น",
    "เพิ่ม internal links 2-3 แห่ง"
  ],
  "recommendations": [
    "เพิ่มคำหลักใน H2 อีก 1-2 แห่ง",
    "ย่อหน้าบางตอนยาวเกินไป แบ่งออกเป็น 2 ย่อหน้า"
  ]
}

Quality Thresholds

Score Range Status Action
90-100 Excellent Publish immediately
80-89 Good Minor tweaks, publishable
70-79 Fair Address priority fixes
Below 70 Needs Work Significant improvements required

⚠️ Important Notes

  1. Thai Word Counting: Uses PyThaiNLP for accurate counting (no spaces between Thai words)

  2. Formality Detection: Auto-detects from particles (ครับ/ค่ะ vs นะ/จ้ะ)

  3. Keyword Density: Thai target is 1.0-1.5% (lower than English 1.5-2.0%)

  4. Readability: Thai grade levels (ม.6-ม.12) instead of Flesch scores

  5. AI Patterns: Thai-specific patterns (overly formal, repetitive structures)


🔄 Integration with Other Skills

  • seo-multi-channel: Calls for quality scoring before output
  • seo-context: Loads brand voice for alignment scoring
  • website-creator: Validates content before publishing

Use this skill when you need to analyze content quality, check keyword density, or clean AI patterns from Thai or English content.