Auto-sync from website-creator
This commit is contained in:
424
skills/seo-analyzers/SKILL.md
Normal file
424
skills/seo-analyzers/SKILL.md
Normal file
@@ -0,0 +1,424 @@
|
||||
---
|
||||
name: seo-analyzers
|
||||
description: Analyze content quality with Thai language support. Use for keyword density, readability scoring, SEO quality rating (0-100), and AI pattern detection.
|
||||
---
|
||||
|
||||
# 🔍 SEO Analyzers - Thai Language Content Analysis
|
||||
|
||||
**Skill Name:** `seo-analyzers`
|
||||
**Category:** `quick`
|
||||
**Load Skills:** `[]`
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Purpose
|
||||
|
||||
Analyze content quality with full Thai language support:
|
||||
|
||||
- ✅ **Thai keyword density** - PyThaiNLP-based word counting
|
||||
- ✅ **Thai readability scoring** - Grade level, formality detection
|
||||
- ✅ **Content quality rating** - Overall 0-100 score
|
||||
- ✅ **AI pattern detection** - Remove AI watermarks (Thai-aware)
|
||||
- ✅ **Search intent analysis** - Classify Thai queries
|
||||
|
||||
**Use Cases:**
|
||||
1. Analyze blog post quality before publishing
|
||||
2. Check keyword density for Thai content
|
||||
3. Score content quality (0-100)
|
||||
4. Remove AI patterns from generated content
|
||||
5. Analyze search intent for Thai keywords
|
||||
|
||||
---
|
||||
|
||||
## 📋 Pre-Flight Questions
|
||||
|
||||
**MUST ask before analyzing:**
|
||||
|
||||
1. **Content to Analyze:**
|
||||
- Text content (paste directly)
|
||||
- File path (Markdown, TXT)
|
||||
- URL (fetch and analyze)
|
||||
|
||||
2. **Analysis Type:** (Default: All)
|
||||
- Keyword density
|
||||
- Readability score
|
||||
- Quality rating (0-100)
|
||||
- AI pattern detection
|
||||
- Search intent
|
||||
|
||||
3. **Target Keyword:** (For keyword analysis)
|
||||
- Primary keyword
|
||||
- Secondary keywords (optional)
|
||||
|
||||
4. **Content Language:** (Auto-detect or specify)
|
||||
- Thai
|
||||
- English
|
||||
- Auto-detect
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Workflows
|
||||
|
||||
### **Workflow 1: Keyword Density Analysis**
|
||||
|
||||
```python
|
||||
Input: Article text + target keyword
|
||||
Process:
|
||||
1. Count Thai words (PyThaiNLP)
|
||||
2. Calculate keyword density
|
||||
3. Check critical placements (H1, first 100 words, conclusion)
|
||||
4. Detect keyword stuffing
|
||||
Output:
|
||||
- Word count
|
||||
- Keyword occurrences
|
||||
- Density percentage
|
||||
- Status (too_low/optimal/too_high)
|
||||
- Recommendations
|
||||
```
|
||||
|
||||
### **Workflow 2: Readability Scoring**
|
||||
|
||||
```python
|
||||
Input: Article text
|
||||
Process:
|
||||
1. Count sentences (Thai-aware)
|
||||
2. Calculate average sentence length
|
||||
3. Detect formality level (Thai particles)
|
||||
4. Estimate grade level
|
||||
Output:
|
||||
- Avg sentence length
|
||||
- Grade level (ม.6-ม.12 or 8-10)
|
||||
- Formality score (กันเอง/ปกติ/เป็นทางการ)
|
||||
- Readability recommendations
|
||||
```
|
||||
|
||||
### **Workflow 3: Quality Rating (0-100)**
|
||||
|
||||
```python
|
||||
Input: Article text + keyword
|
||||
Process:
|
||||
1. Keyword optimization (25 points)
|
||||
2. Readability (25 points)
|
||||
3. Content structure (25 points)
|
||||
4. Brand voice alignment (25 points)
|
||||
Output:
|
||||
- Overall score (0-100)
|
||||
- Category breakdowns
|
||||
- Priority fixes
|
||||
- Publishing readiness status
|
||||
```
|
||||
|
||||
### **Workflow 4: AI Pattern Detection**
|
||||
|
||||
```python
|
||||
Input: Generated content
|
||||
Process:
|
||||
1. Remove Unicode watermarks (zero-width spaces)
|
||||
2. Replace em-dashes with appropriate punctuation
|
||||
3. Detect AI patterns (repetitive structures)
|
||||
4. Thai-specific patterns (overly formal language)
|
||||
Output:
|
||||
- Cleaned content
|
||||
- Statistics (chars removed, patterns fixed)
|
||||
- AI probability score
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Technical Implementation
|
||||
|
||||
### **Thai Keyword Analyzer:**
|
||||
|
||||
```python
|
||||
from pythainlp import word_tokenize
|
||||
from pythainlp.util import normalize
|
||||
|
||||
def count_thai_words(text: str) -> int:
|
||||
"""Count Thai words accurately (no spaces between words)"""
|
||||
tokens = word_tokenize(text, engine="newmm")
|
||||
return len([t for t in tokens if t.strip() and not t.isspace()])
|
||||
|
||||
def calculate_density(text: str, keyword: str) -> float:
|
||||
"""Calculate keyword density for Thai text"""
|
||||
text_norm = normalize(text)
|
||||
keyword_norm = normalize(keyword)
|
||||
count = text_norm.count(keyword_norm)
|
||||
word_count = count_thai_words(text)
|
||||
return (count / word_count * 100) if word_count > 0 else 0
|
||||
|
||||
def check_critical_placements(text: str, keyword: str) -> Dict:
|
||||
"""Check keyword in critical locations"""
|
||||
return {
|
||||
'in_first_100_words': keyword in text[:200], # Thai chars are longer
|
||||
'in_h1': check_h1(text, keyword),
|
||||
'in_conclusion': keyword in text[-500:],
|
||||
'density_status': get_density_status(calculate_density(text, keyword))
|
||||
}
|
||||
```
|
||||
|
||||
### **Thai Readability Scorer:**
|
||||
|
||||
```python
|
||||
from pythainlp import sent_tokenize, word_tokenize
|
||||
|
||||
def calculate_thai_readability(text: str) -> Dict:
|
||||
"""
|
||||
Thai readability scoring (adapted for Thai language)
|
||||
|
||||
Thai doesn't have spaces between words, so we use:
|
||||
- Average sentence length (words per sentence)
|
||||
- Presence of formal/informal particles
|
||||
- Paragraph structure
|
||||
"""
|
||||
sentences = sent_tokenize(text, engine="whitespace")
|
||||
total_words = sum(len(word_tokenize(s, engine="newmm")) for s in sentences)
|
||||
avg_sentence_length = total_words / len(sentences) if sentences else 0
|
||||
|
||||
# Detect formality level
|
||||
formality = detect_thai_formality(text)
|
||||
|
||||
# Estimate grade level
|
||||
if avg_sentence_length < 15:
|
||||
grade_level = "ง่าย (ม.6-ม.9)"
|
||||
elif avg_sentence_length < 25:
|
||||
grade_level = "ปานกลาง (ม.10-ม.12)"
|
||||
else:
|
||||
grade_level = "ยาก (ม.13+)"
|
||||
|
||||
return {
|
||||
'avg_sentence_length': round(avg_sentence_length, 1),
|
||||
'grade_level': grade_level,
|
||||
'formality': formality,
|
||||
'score': calculate_readability_score(avg_sentence_length, formality)
|
||||
}
|
||||
|
||||
def detect_thai_formality(text: str) -> str:
|
||||
"""
|
||||
Detect Thai formality level from particles and word choice
|
||||
"""
|
||||
formal_particles = ['ครับ', 'ค่ะ', 'ข้าพเจ้า', 'ท่าน', 'ซึ่ง', 'อัน']
|
||||
informal_particles = ['นะ', 'จ้ะ', 'อ่ะ', 'มั้ย', 'gue', 'mang']
|
||||
|
||||
formal_count = sum(text.count(p) for p in formal_particles)
|
||||
informal_count = sum(text.count(p) for p in informal_particles)
|
||||
|
||||
ratio = formal_count / (formal_count + informal_count) if (formal_count + informal_count) > 0 else 0.5
|
||||
|
||||
if ratio > 0.6:
|
||||
return "เป็นทางการ (Formal)"
|
||||
elif ratio < 0.4:
|
||||
return "กันเอง (Casual)"
|
||||
else:
|
||||
return "ปกติ (Normal)"
|
||||
```
|
||||
|
||||
### **Content Quality Scorer:**
|
||||
|
||||
```python
|
||||
def calculate_quality_score(text: str, keyword: str, brand_voice: Dict) -> Dict:
|
||||
"""
|
||||
Calculate overall content quality score (0-100)
|
||||
|
||||
Categories:
|
||||
- Keyword Optimization: 25 points
|
||||
- Readability: 25 points
|
||||
- Content Structure: 25 points
|
||||
- Brand Voice Alignment: 25 points
|
||||
"""
|
||||
scores = {
|
||||
'keyword_optimization': score_keyword_optimization(text, keyword),
|
||||
'readability': score_readability(text),
|
||||
'structure': score_structure(text),
|
||||
'brand_voice': score_brand_voice(text, brand_voice)
|
||||
}
|
||||
|
||||
total = sum(scores.values())
|
||||
|
||||
return {
|
||||
'overall_score': round(total, 1),
|
||||
'categories': scores,
|
||||
'status': get_quality_status(total),
|
||||
'recommendations': get_quality_recommendations(scores)
|
||||
}
|
||||
|
||||
def score_keyword_optimization(text: str, keyword: str) -> float:
|
||||
"""Score keyword optimization (0-25)"""
|
||||
density = calculate_density(text, keyword)
|
||||
placements = check_critical_placements(text, keyword)
|
||||
|
||||
score = 0
|
||||
|
||||
# Density score (10 points)
|
||||
if 1.0 <= density <= 1.5:
|
||||
score += 10
|
||||
elif 0.5 <= density < 1.0 or 1.5 < density <= 2.0:
|
||||
score += 5
|
||||
|
||||
# Critical placements (15 points)
|
||||
if placements['in_first_100_words']:
|
||||
score += 5
|
||||
if placements['in_h1']:
|
||||
score += 5
|
||||
if placements['in_conclusion']:
|
||||
score += 5
|
||||
|
||||
return score
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 Commands
|
||||
|
||||
### **Analyze Keyword Density:**
|
||||
|
||||
```bash
|
||||
python3 skills/seo-analyzers/scripts/thai_keyword_analyzer.py \
|
||||
--text "บทความเกี่ยวกับบริการ podcast hosting..." \
|
||||
--keyword "บริการ podcast" \
|
||||
--language th
|
||||
```
|
||||
|
||||
### **Score Content Quality:**
|
||||
|
||||
```bash
|
||||
python3 skills/seo-analyzers/scripts/content_quality_scorer.py \
|
||||
--file drafts/article.md \
|
||||
--keyword "podcast hosting" \
|
||||
--context "./website/context/"
|
||||
```
|
||||
|
||||
### **Check Readability:**
|
||||
|
||||
```bash
|
||||
python3 skills/seo-analyzers/scripts/thai_readability.py \
|
||||
--text "เนื้อหาบทความภาษาไทย..." \
|
||||
--language th
|
||||
```
|
||||
|
||||
### **Clean AI Patterns:**
|
||||
|
||||
```bash
|
||||
python3 skills/seo-analyzers/scripts/content_scrubber_thai.py \
|
||||
--file drafts/ai-generated.md \
|
||||
--output drafts/cleaned.md \
|
||||
--verbose
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Environment Variables
|
||||
|
||||
**Optional (in unified .env):**
|
||||
|
||||
```bash
|
||||
# No API keys required for seo-analyzers
|
||||
# All processing is local with PyThaiNLP
|
||||
|
||||
# Optional: For advanced NLP
|
||||
NLTK_DATA_PATH=/path/to/nltk_data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Output Examples
|
||||
|
||||
### **Keyword Analysis Output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"word_count": 1847,
|
||||
"keyword": "บริการ podcast",
|
||||
"occurrences": 23,
|
||||
"density": 1.25,
|
||||
"status": "optimal",
|
||||
"critical_placements": {
|
||||
"in_first_100_words": true,
|
||||
"in_h1": true,
|
||||
"in_conclusion": true,
|
||||
"in_h2_count": 3
|
||||
},
|
||||
"keyword_stuffing_risk": "none",
|
||||
"recommendations": []
|
||||
}
|
||||
```
|
||||
|
||||
### **Readability Output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"avg_sentence_length": 18.5,
|
||||
"grade_level": "ปานกลาง (ม.10-ม.12)",
|
||||
"formality": "ปกติ (Normal)",
|
||||
"score": 75,
|
||||
"details": {
|
||||
"sentence_count": 98,
|
||||
"paragraph_count": 24,
|
||||
"avg_paragraph_length": 4.1
|
||||
},
|
||||
"recommendations": [
|
||||
"ลดความยาวประโยคบ้าง (บางประโยคยาวเกินไป)",
|
||||
"รักษาระดับความเป็นกันเองนี้ไว้"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### **Quality Score Output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"overall_score": 82.5,
|
||||
"categories": {
|
||||
"keyword_optimization": 22.5,
|
||||
"readability": 20.0,
|
||||
"structure": 23.0,
|
||||
"brand_voice": 17.0
|
||||
},
|
||||
"status": "good",
|
||||
"publishing_readiness": "Ready with minor tweaks",
|
||||
"priority_fixes": [
|
||||
"ปรับ brand voice ให้เป็นกันเองมากขึ้น",
|
||||
"เพิ่ม internal links 2-3 แห่ง"
|
||||
],
|
||||
"recommendations": [
|
||||
"เพิ่มคำหลักใน H2 อีก 1-2 แห่ง",
|
||||
"ย่อหน้าบางตอนยาวเกินไป แบ่งออกเป็น 2 ย่อหน้า"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Quality Thresholds
|
||||
|
||||
| Score Range | Status | Action |
|
||||
|-------------|--------|--------|
|
||||
| 90-100 | Excellent | Publish immediately |
|
||||
| 80-89 | Good | Minor tweaks, publishable |
|
||||
| 70-79 | Fair | Address priority fixes |
|
||||
| Below 70 | Needs Work | Significant improvements required |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Important Notes
|
||||
|
||||
1. **Thai Word Counting:** Uses PyThaiNLP for accurate counting (no spaces between Thai words)
|
||||
|
||||
2. **Formality Detection:** Auto-detects from particles (ครับ/ค่ะ vs นะ/จ้ะ)
|
||||
|
||||
3. **Keyword Density:** Thai target is 1.0-1.5% (lower than English 1.5-2.0%)
|
||||
|
||||
4. **Readability:** Thai grade levels (ม.6-ม.12) instead of Flesch scores
|
||||
|
||||
5. **AI Patterns:** Thai-specific patterns (overly formal, repetitive structures)
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Integration with Other Skills
|
||||
|
||||
- **seo-multi-channel:** Calls for quality scoring before output
|
||||
- **seo-context:** Loads brand voice for alignment scoring
|
||||
- **website-creator:** Validates content before publishing
|
||||
|
||||
---
|
||||
|
||||
**Use this skill when you need to analyze content quality, check keyword density, or clean AI patterns from Thai or English content.**
|
||||
Reference in New Issue
Block a user