Files

Kunthawat Greethong 9be686f587 Auto-sync from website-creator

2026-03-08 23:03:19 +07:00

11 KiB

Raw Permalink Blame History

name, description

name	description
seo-analyzers	Analyze content quality with Thai language support. Use for keyword density, readability scoring, SEO quality rating (0-100), and AI pattern detection.

🔍 SEO Analyzers - Thai Language Content Analysis

Skill Name: seo-analyzers
Category: quick
Load Skills: []

🚀 Purpose

Analyze content quality with full Thai language support:

✅ Thai keyword density - PyThaiNLP-based word counting
✅ Thai readability scoring - Grade level, formality detection
✅ Content quality rating - Overall 0-100 score
✅ AI pattern detection - Remove AI watermarks (Thai-aware)
✅ Search intent analysis - Classify Thai queries

Use Cases:

Analyze blog post quality before publishing
Check keyword density for Thai content
Score content quality (0-100)
Remove AI patterns from generated content
Analyze search intent for Thai keywords

📋 Pre-Flight Questions

MUST ask before analyzing:

Content to Analyze:
- Text content (paste directly)
- File path (Markdown, TXT)
- URL (fetch and analyze)
Analysis Type: (Default: All)
- Keyword density
- Readability score
- Quality rating (0-100)
- AI pattern detection
- Search intent
Target Keyword: (For keyword analysis)
- Primary keyword
- Secondary keywords (optional)
Content Language: (Auto-detect or specify)
- Thai
- English
- Auto-detect

🔄 Workflows

Workflow 1: Keyword Density Analysis

Input: Article text + target keyword
Process:
  1. Count Thai words (PyThaiNLP)
  2. Calculate keyword density
  3. Check critical placements (H1, first 100 words, conclusion)
  4. Detect keyword stuffing
Output:
  - Word count
  - Keyword occurrences
  - Density percentage
  - Status (too_low/optimal/too_high)
  - Recommendations

Workflow 2: Readability Scoring

Input: Article text
Process:
  1. Count sentences (Thai-aware)
  2. Calculate average sentence length
  3. Detect formality level (Thai particles)
  4. Estimate grade level
Output:
  - Avg sentence length
  - Grade level (ม.6-ม.12 or 8-10)
  - Formality score (กันเอง/ปกติ/เป็นทางการ)
  - Readability recommendations

Workflow 3: Quality Rating (0-100)

Input: Article text + keyword
Process:
  1. Keyword optimization (25 points)
  2. Readability (25 points)
  3. Content structure (25 points)
  4. Brand voice alignment (25 points)
Output:
  - Overall score (0-100)
  - Category breakdowns
  - Priority fixes
  - Publishing readiness status

Workflow 4: AI Pattern Detection

Input: Generated content
Process:
  1. Remove Unicode watermarks (zero-width spaces)
  2. Replace em-dashes with appropriate punctuation
  3. Detect AI patterns (repetitive structures)
  4. Thai-specific patterns (overly formal language)
Output:
  - Cleaned content
  - Statistics (chars removed, patterns fixed)
  - AI probability score

🔧 Technical Implementation

Thai Keyword Analyzer:

from pythainlp import word_tokenize
from pythainlp.util import normalize

def count_thai_words(text: str) -> int:
    """Count Thai words accurately (no spaces between words)"""
    tokens = word_tokenize(text, engine="newmm")
    return len([t for t in tokens if t.strip() and not t.isspace()])

def calculate_density(text: str, keyword: str) -> float:
    """Calculate keyword density for Thai text"""
    text_norm = normalize(text)
    keyword_norm = normalize(keyword)
    count = text_norm.count(keyword_norm)
    word_count = count_thai_words(text)
    return (count / word_count * 100) if word_count > 0 else 0

def check_critical_placements(text: str, keyword: str) -> Dict:
    """Check keyword in critical locations"""
    return {
        'in_first_100_words': keyword in text[:200],  # Thai chars are longer
        'in_h1': check_h1(text, keyword),
        'in_conclusion': keyword in text[-500:],
        'density_status': get_density_status(calculate_density(text, keyword))
    }

Thai Readability Scorer:

from pythainlp import sent_tokenize, word_tokenize

def calculate_thai_readability(text: str) -> Dict:
    """
    Thai readability scoring (adapted for Thai language)
    
    Thai doesn't have spaces between words, so we use:
    - Average sentence length (words per sentence)
    - Presence of formal/informal particles
    - Paragraph structure
    """
    sentences = sent_tokenize(text, engine="whitespace")
    total_words = sum(len(word_tokenize(s, engine="newmm")) for s in sentences)
    avg_sentence_length = total_words / len(sentences) if sentences else 0
    
    # Detect formality level
    formality = detect_thai_formality(text)
    
    # Estimate grade level
    if avg_sentence_length < 15:
        grade_level = "ง่าย (ม.6-ม.9)"
    elif avg_sentence_length < 25:
        grade_level = "ปานกลาง (ม.10-ม.12)"
    else:
        grade_level = "ยาก (ม.13+)"
    
    return {
        'avg_sentence_length': round(avg_sentence_length, 1),
        'grade_level': grade_level,
        'formality': formality,
        'score': calculate_readability_score(avg_sentence_length, formality)
    }

def detect_thai_formality(text: str) -> str:
    """
    Detect Thai formality level from particles and word choice
    """
    formal_particles = ['ครับ', 'ค่ะ', 'ข้าพเจ้า', 'ท่าน', 'ซึ่ง', 'อัน']
    informal_particles = ['นะ', 'จ้ะ', 'อ่ะ', 'มั้ย', 'gue', 'mang']
    
    formal_count = sum(text.count(p) for p in formal_particles)
    informal_count = sum(text.count(p) for p in informal_particles)
    
    ratio = formal_count / (formal_count + informal_count) if (formal_count + informal_count) > 0 else 0.5
    
    if ratio > 0.6:
        return "เป็นทางการ (Formal)"
    elif ratio < 0.4:
        return "กันเอง (Casual)"
    else:
        return "ปกติ (Normal)"

Content Quality Scorer:

def calculate_quality_score(text: str, keyword: str, brand_voice: Dict) -> Dict:
    """
    Calculate overall content quality score (0-100)
    
    Categories:
    - Keyword Optimization: 25 points
    - Readability: 25 points
    - Content Structure: 25 points
    - Brand Voice Alignment: 25 points
    """
    scores = {
        'keyword_optimization': score_keyword_optimization(text, keyword),
        'readability': score_readability(text),
        'structure': score_structure(text),
        'brand_voice': score_brand_voice(text, brand_voice)
    }
    
    total = sum(scores.values())
    
    return {
        'overall_score': round(total, 1),
        'categories': scores,
        'status': get_quality_status(total),
        'recommendations': get_quality_recommendations(scores)
    }

def score_keyword_optimization(text: str, keyword: str) -> float:
    """Score keyword optimization (0-25)"""
    density = calculate_density(text, keyword)
    placements = check_critical_placements(text, keyword)
    
    score = 0
    
    # Density score (10 points)
    if 1.0 <= density <= 1.5:
        score += 10
    elif 0.5 <= density < 1.0 or 1.5 < density <= 2.0:
        score += 5
    
    # Critical placements (15 points)
    if placements['in_first_100_words']:
        score += 5
    if placements['in_h1']:
        score += 5
    if placements['in_conclusion']:
        score += 5
    
    return score

📁 Commands

Analyze Keyword Density:

python3 skills/seo-analyzers/scripts/thai_keyword_analyzer.py \
  --text "บทความเกี่ยวกับบริการ podcast hosting..." \
  --keyword "บริการ podcast" \
  --language th

Score Content Quality:

python3 skills/seo-analyzers/scripts/content_quality_scorer.py \
  --file drafts/article.md \
  --keyword "podcast hosting" \
  --context "./website/context/"

Check Readability:

python3 skills/seo-analyzers/scripts/thai_readability.py \
  --text "เนื้อหาบทความภาษาไทย..." \
  --language th

Clean AI Patterns:

python3 skills/seo-analyzers/scripts/content_scrubber_thai.py \
  --file drafts/ai-generated.md \
  --output drafts/cleaned.md \
  --verbose

⚙️ Environment Variables

Optional (in unified .env):

# No API keys required for seo-analyzers
# All processing is local with PyThaiNLP

# Optional: For advanced NLP
NLTK_DATA_PATH=/path/to/nltk_data

📊 Output Examples

Keyword Analysis Output:

{
  "word_count": 1847,
  "keyword": "บริการ podcast",
  "occurrences": 23,
  "density": 1.25,
  "status": "optimal",
  "critical_placements": {
    "in_first_100_words": true,
    "in_h1": true,
    "in_conclusion": true,
    "in_h2_count": 3
  },
  "keyword_stuffing_risk": "none",
  "recommendations": []
}

Readability Output:

{
  "avg_sentence_length": 18.5,
  "grade_level": "ปานกลาง (ม.10-ม.12)",
  "formality": "ปกติ (Normal)",
  "score": 75,
  "details": {
    "sentence_count": 98,
    "paragraph_count": 24,
    "avg_paragraph_length": 4.1
  },
  "recommendations": [
    "ลดความยาวประโยคบ้าง (บางประโยคยาวเกินไป)",
    "รักษาระดับความเป็นกันเองนี้ไว้"
  ]
}

Quality Score Output:

{
  "overall_score": 82.5,
  "categories": {
    "keyword_optimization": 22.5,
    "readability": 20.0,
    "structure": 23.0,
    "brand_voice": 17.0
  },
  "status": "good",
  "publishing_readiness": "Ready with minor tweaks",
  "priority_fixes": [
    "ปรับ brand voice ให้เป็นกันเองมากขึ้น",
    "เพิ่ม internal links 2-3 แห่ง"
  ],
  "recommendations": [
    "เพิ่มคำหลักใน H2 อีก 1-2 แห่ง",
    "ย่อหน้าบางตอนยาวเกินไป แบ่งออกเป็น 2 ย่อหน้า"
  ]
}

✅ Quality Thresholds

Score Range	Status	Action
90-100	Excellent	Publish immediately
80-89	Good	Minor tweaks, publishable
70-79	Fair	Address priority fixes
Below 70	Needs Work	Significant improvements required

⚠️ Important Notes

Thai Word Counting: Uses PyThaiNLP for accurate counting (no spaces between Thai words)
Formality Detection: Auto-detects from particles (ครับ/ค่ะ vs นะ/จ้ะ)
Keyword Density: Thai target is 1.0-1.5% (lower than English 1.5-2.0%)
Readability: Thai grade levels (ม.6-ม.12) instead of Flesch scores
AI Patterns: Thai-specific patterns (overly formal, repetitive structures)

🔄 Integration with Other Skills

seo-multi-channel: Calls for quality scoring before output
seo-context: Loads brand voice for alignment scoring
website-creator: Validates content before publishing

Use this skill when you need to analyze content quality, check keyword density, or clean AI patterns from Thai or English content.

11 KiB Raw Permalink Blame History