AI Blog Rewriter Updater feature complete
This commit is contained in:
163
lib/ai_writers/blog_rewriter_updater/README.md
Normal file
163
lib/ai_writers/blog_rewriter_updater/README.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# AI Blog Rewriter & Updater
|
||||
|
||||
A powerful AI-powered tool for rewriting and updating existing blog content with improved quality, factual accuracy, and SEO optimization.
|
||||
|
||||
## Features
|
||||
|
||||
### 1. Content Import
|
||||
- **URL Import**: Automatically extract content from any blog URL
|
||||
- **Manual Input**: Paste content directly with title, meta description, and author information
|
||||
- **Smart Content Extraction**: Preserves structure, headings, images, and metadata
|
||||
|
||||
### 2. Content Analysis
|
||||
- **Metrics Analysis**:
|
||||
- Word count
|
||||
- Sentence count
|
||||
- Paragraph count
|
||||
- Average words per sentence
|
||||
- Average sentences per paragraph
|
||||
- **Structure Analysis**:
|
||||
- Heading hierarchy
|
||||
- Content organization
|
||||
- Image analysis
|
||||
- **Age Analysis**:
|
||||
- Content age calculation
|
||||
- Publication date detection
|
||||
|
||||
### 3. Web Research
|
||||
- **Topic Extraction**: Automatically identifies key topics for fact-checking
|
||||
- **Multi-Source Research**: Gathers information from various sources
|
||||
- **Research Depth Control**: Choose between low, medium, and high research depth
|
||||
- **Source Organization**: Categorizes research by topic with source details
|
||||
|
||||
### 4. Rewriting Modes
|
||||
- **Standard Rewrite**: Improve clarity and flow while maintaining core message
|
||||
- **SEO Optimization**: Enhance content for search engines with targeted keywords
|
||||
- **Simplification**: Make complex content more accessible
|
||||
- **Expansion**: Add more details and examples
|
||||
- **Fact Check**: Update outdated information
|
||||
- **Tone Shift**: Change writing style while preserving content
|
||||
- **Modernization**: Update with current information and trends
|
||||
|
||||
### 5. Customization Options
|
||||
- **Tone Selection**:
|
||||
- Professional
|
||||
- Conversational
|
||||
- Academic
|
||||
- Enthusiastic
|
||||
- Authoritative
|
||||
- Friendly
|
||||
- Technical
|
||||
- Inspirational
|
||||
- **Length Control**:
|
||||
- Maintain original length
|
||||
- Create shorter version
|
||||
- Create longer version
|
||||
- Custom word count
|
||||
- **SEO Features**:
|
||||
- Focus keyword optimization
|
||||
- Meta description generation
|
||||
- Title optimization
|
||||
- **Special Instructions**: Add custom requirements for the rewrite
|
||||
|
||||
### 6. Image Generation
|
||||
- **AI Image Suggestions**: Get recommendations for relevant images
|
||||
- **Custom Image Generation**: Create images based on content
|
||||
- **Style Options**:
|
||||
- Realistic
|
||||
- Artistic
|
||||
- Cartoon
|
||||
- 3D Render
|
||||
- **Image Placement**: Suggested optimal placement within content
|
||||
|
||||
### 7. Export Options
|
||||
- **Preview Mode**: View formatted content
|
||||
- **Markdown Export**: Get clean markdown version
|
||||
- **Image Integration**: Include generated images with captions
|
||||
- **Meta Information**: Export with optimized title and meta description
|
||||
|
||||
## Usage
|
||||
|
||||
1. **Import Content**
|
||||
- Choose between URL import or manual content entry
|
||||
- Provide necessary metadata (title, author, etc.)
|
||||
|
||||
2. **Analysis & Research**
|
||||
- Review content analysis metrics
|
||||
- Examine research findings
|
||||
- Identify areas for improvement
|
||||
|
||||
3. **Configure Rewrite Settings**
|
||||
- Select rewrite mode
|
||||
- Choose target tone
|
||||
- Set content length
|
||||
- Add focus keywords
|
||||
- Provide special instructions
|
||||
|
||||
4. **Review & Export**
|
||||
- Preview rewritten content
|
||||
- Generate suggested images
|
||||
- Export in desired format
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Dependencies
|
||||
- Streamlit for UI
|
||||
- BeautifulSoup for content extraction
|
||||
- GPT providers for text generation
|
||||
- Image generation capabilities
|
||||
- Web research APIs (Exa, Tavily)
|
||||
|
||||
### Key Components
|
||||
- `BlogRewriter` class: Core functionality
|
||||
- Content extraction and analysis
|
||||
- Research integration
|
||||
- AI-powered rewriting
|
||||
- Image generation
|
||||
- Export capabilities
|
||||
|
||||
### Error Handling
|
||||
- Robust error handling for URL extraction
|
||||
- Fallback mechanisms for content parsing
|
||||
- Graceful degradation for API failures
|
||||
- User-friendly error messages
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Content Import**
|
||||
- Use clean, well-structured URLs
|
||||
- Provide complete metadata for manual entry
|
||||
- Ensure content is properly formatted
|
||||
|
||||
2. **Research Settings**
|
||||
- Choose appropriate research depth
|
||||
- Review research findings carefully
|
||||
- Verify source credibility
|
||||
|
||||
3. **Rewrite Configuration**
|
||||
- Select appropriate tone for audience
|
||||
- Use relevant focus keywords
|
||||
- Provide clear special instructions
|
||||
|
||||
4. **Image Generation**
|
||||
- Use descriptive prompts
|
||||
- Choose appropriate style
|
||||
- Consider image placement
|
||||
|
||||
## Limitations
|
||||
|
||||
- Maximum content length for processing
|
||||
- API rate limits for research
|
||||
- Image generation constraints
|
||||
- Language support limitations
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- Multi-language support
|
||||
- Advanced SEO analysis
|
||||
- Content structure templates
|
||||
- Collaborative editing
|
||||
- Integration with CMS platforms
|
||||
- Custom AI model selection
|
||||
- Advanced image editing
|
||||
- Content versioning
|
||||
11
lib/ai_writers/blog_rewriter_updater/ai_blog_rewriter.py
Normal file
11
lib/ai_writers/blog_rewriter_updater/ai_blog_rewriter.py
Normal file
@@ -0,0 +1,11 @@
|
||||
"""
|
||||
AI Blog Rewriter Module
|
||||
|
||||
This module provides the main entry point for the blog rewriter functionality,
|
||||
importing and using the utility and UI modules.
|
||||
"""
|
||||
|
||||
from .blog_rewriter_ui import write_blog_rewriter
|
||||
|
||||
if __name__ == "__main__":
|
||||
write_blog_rewriter()
|
||||
624
lib/ai_writers/blog_rewriter_updater/blog_rewriter_ui.py
Normal file
624
lib/ai_writers/blog_rewriter_updater/blog_rewriter_ui.py
Normal file
@@ -0,0 +1,624 @@
|
||||
"""
|
||||
Blog Rewriter UI Module
|
||||
|
||||
This module contains the Streamlit interface for the blog rewriter,
|
||||
providing a user-friendly way to interact with the rewriting functionality.
|
||||
"""
|
||||
|
||||
import streamlit as st
|
||||
import json
|
||||
from datetime import datetime
|
||||
from .blog_rewriter_utils import BlogRewriter, REWRITE_MODES, TONE_OPTIONS, MAX_META_DESCRIPTION_LENGTH
|
||||
|
||||
def write_blog_rewriter():
|
||||
"""Main function to display the blog rewriter UI."""
|
||||
st.title("AI Blog Rewriter & Updater")
|
||||
|
||||
# Create a container for the header section
|
||||
with st.container():
|
||||
st.markdown("""
|
||||
<div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; margin-bottom: 20px;">
|
||||
<h3 style="margin-top: 0;">Revitalize Your Content</h3>
|
||||
<p>Update, fact-check, and enhance your existing blog posts with AI assistance.
|
||||
Our tool analyzes your content, researches the latest information, and rewrites your blog
|
||||
to be more engaging, accurate, and SEO-friendly.</p>
|
||||
</div>
|
||||
""", unsafe_allow_html=True)
|
||||
|
||||
# Initialize the BlogRewriter class
|
||||
if "blog_rewriter" not in st.session_state:
|
||||
st.session_state.blog_rewriter = BlogRewriter()
|
||||
|
||||
# Initialize session state variables
|
||||
if "original_content" not in st.session_state:
|
||||
st.session_state.original_content = {}
|
||||
if "content_analysis" not in st.session_state:
|
||||
st.session_state.content_analysis = {}
|
||||
if "research_results" not in st.session_state:
|
||||
st.session_state.research_results = {}
|
||||
if "rewritten_content" not in st.session_state:
|
||||
st.session_state.rewritten_content = {}
|
||||
if "generated_images" not in st.session_state:
|
||||
st.session_state.generated_images = {}
|
||||
if "current_step" not in st.session_state:
|
||||
st.session_state.current_step = 1
|
||||
|
||||
# Create tabs for the workflow
|
||||
tab1, tab2, tab3, tab4 = st.tabs([
|
||||
"1️⃣ Import Content",
|
||||
"2️⃣ Analyze & Research",
|
||||
"3️⃣ Rewrite Settings",
|
||||
"4️⃣ Results & Export"
|
||||
])
|
||||
|
||||
# Tab 1: Import Content
|
||||
with tab1:
|
||||
st.header("Import Your Blog Content")
|
||||
|
||||
import_method = st.radio(
|
||||
"Choose import method:",
|
||||
["Import from URL", "Paste content manually"],
|
||||
horizontal=True
|
||||
)
|
||||
|
||||
if import_method == "Import from URL":
|
||||
url = st.text_input(
|
||||
"Enter blog URL:",
|
||||
placeholder="https://example.com/blog-post",
|
||||
help="Enter the full URL of the blog post you want to rewrite"
|
||||
)
|
||||
|
||||
if st.button("Import Content", type="primary"):
|
||||
if not url:
|
||||
st.error("Please enter a valid URL")
|
||||
else:
|
||||
with st.spinner("Extracting content from URL..."):
|
||||
# Extract content from URL
|
||||
st.session_state.original_content = st.session_state.blog_rewriter.extract_content_from_url(url)
|
||||
|
||||
if "error" in st.session_state.original_content:
|
||||
st.error(f"Error extracting content: {st.session_state.original_content['error']}")
|
||||
else:
|
||||
st.success("Content extracted successfully!")
|
||||
st.session_state.current_step = 2
|
||||
st.rerun()
|
||||
else:
|
||||
col1, col2 = st.columns([3, 1])
|
||||
|
||||
with col1:
|
||||
title = st.text_input(
|
||||
"Blog Title:",
|
||||
placeholder="Enter the title of your blog post"
|
||||
)
|
||||
|
||||
with col2:
|
||||
author = st.text_input(
|
||||
"Author (optional):",
|
||||
placeholder="Author name"
|
||||
)
|
||||
|
||||
meta_description = st.text_area(
|
||||
"Meta Description (optional):",
|
||||
placeholder="Enter the meta description of your blog post",
|
||||
max_chars=MAX_META_DESCRIPTION_LENGTH,
|
||||
height=80
|
||||
)
|
||||
|
||||
content = st.text_area(
|
||||
"Blog Content:",
|
||||
placeholder="Paste your blog content here...",
|
||||
height=300
|
||||
)
|
||||
|
||||
if st.button("Import Content", type="primary"):
|
||||
if not title or not content:
|
||||
st.error("Please enter both title and content")
|
||||
else:
|
||||
# Store the manually entered content
|
||||
st.session_state.original_content = {
|
||||
"title": title,
|
||||
"meta_description": meta_description,
|
||||
"content": content,
|
||||
"author": author,
|
||||
"headings": [],
|
||||
"images": [],
|
||||
"publish_date": None,
|
||||
"url": None
|
||||
}
|
||||
|
||||
st.success("Content imported successfully!")
|
||||
st.session_state.current_step = 2
|
||||
st.rerun()
|
||||
|
||||
# Display the imported content if available
|
||||
if st.session_state.original_content and "title" in st.session_state.original_content:
|
||||
with st.expander("View Imported Content", expanded=False):
|
||||
st.subheader(st.session_state.original_content["title"])
|
||||
|
||||
if st.session_state.original_content.get("meta_description"):
|
||||
st.markdown(f"**Meta Description:** {st.session_state.original_content['meta_description']}")
|
||||
|
||||
if st.session_state.original_content.get("author"):
|
||||
st.markdown(f"**Author:** {st.session_state.original_content['author']}")
|
||||
|
||||
if st.session_state.original_content.get("publish_date"):
|
||||
st.markdown(f"**Published:** {st.session_state.original_content['publish_date']}")
|
||||
|
||||
st.markdown("**Content Preview:**")
|
||||
content_preview = st.session_state.original_content["content"]
|
||||
if len(content_preview) > 1000:
|
||||
content_preview = content_preview[:1000] + "..."
|
||||
st.text_area("", content_preview, height=200, disabled=True)
|
||||
|
||||
# Display images if available
|
||||
if st.session_state.original_content.get("images"):
|
||||
st.markdown(f"**Images:** {len(st.session_state.original_content['images'])} images found")
|
||||
|
||||
# Tab 2: Analyze & Research
|
||||
with tab2:
|
||||
st.header("Analyze & Research")
|
||||
|
||||
if not st.session_state.original_content or "title" not in st.session_state.original_content:
|
||||
st.info("Please import your blog content first")
|
||||
else:
|
||||
col1, col2 = st.columns(2)
|
||||
|
||||
with col1:
|
||||
if st.button("Analyze Content", type="primary"):
|
||||
with st.spinner("Analyzing content..."):
|
||||
# Analyze the content
|
||||
st.session_state.content_analysis = st.session_state.blog_rewriter.analyze_content(
|
||||
st.session_state.original_content
|
||||
)
|
||||
st.success("Content analysis complete!")
|
||||
|
||||
with col2:
|
||||
research_depth = st.selectbox(
|
||||
"Research Depth:",
|
||||
["low", "medium", "high"],
|
||||
index=1,
|
||||
format_func=lambda x: {"low": "Basic", "medium": "Standard", "high": "Comprehensive"}[x],
|
||||
help="Choose the depth of research to update your content"
|
||||
)
|
||||
|
||||
if st.button("Conduct Research", type="primary"):
|
||||
with st.spinner("Researching latest information..."):
|
||||
# Conduct research
|
||||
st.session_state.research_results = st.session_state.blog_rewriter.conduct_research(
|
||||
st.session_state.original_content["title"],
|
||||
st.session_state.original_content["content"],
|
||||
research_depth
|
||||
)
|
||||
st.success("Research complete!")
|
||||
|
||||
# Display content analysis if available
|
||||
if st.session_state.content_analysis:
|
||||
st.subheader("Content Analysis")
|
||||
|
||||
metrics = st.session_state.content_analysis.get("metrics", {})
|
||||
|
||||
# Create metrics display
|
||||
col1, col2, col3, col4 = st.columns(4)
|
||||
with col1:
|
||||
st.metric("Word Count", metrics.get("word_count", 0))
|
||||
with col2:
|
||||
st.metric("Paragraphs", metrics.get("paragraph_count", 0))
|
||||
with col3:
|
||||
st.metric("Sentences", metrics.get("sentence_count", 0))
|
||||
with col4:
|
||||
content_age = st.session_state.content_analysis.get("content_age", {})
|
||||
if "months" in content_age:
|
||||
st.metric("Content Age", f"{content_age['months']} months")
|
||||
elif "error" in content_age:
|
||||
st.metric("Content Age", "Unknown")
|
||||
|
||||
# Heading structure
|
||||
heading_structure = st.session_state.content_analysis.get("heading_structure", {})
|
||||
if heading_structure:
|
||||
st.markdown("**Heading Structure:**")
|
||||
for level, count in sorted(heading_structure.items()):
|
||||
st.markdown(f"H{level}: {count} headings")
|
||||
|
||||
# Image analysis
|
||||
images = st.session_state.content_analysis.get("images", {})
|
||||
if images:
|
||||
st.markdown(f"**Images:** {images.get('count', 0)} images found, {images.get('with_alt_text', 0)} with alt text")
|
||||
|
||||
# Display research results if available
|
||||
if st.session_state.research_results:
|
||||
st.subheader("Research Results")
|
||||
|
||||
topics = st.session_state.research_results.get("topics", [])
|
||||
if topics:
|
||||
for topic in topics:
|
||||
with st.expander(f"Topic: {topic['topic']}", expanded=False):
|
||||
for i, source in enumerate(topic.get("sources", [])):
|
||||
st.markdown(f"**Source {i+1}:** {source.get('title', 'Untitled')}")
|
||||
st.markdown(f"**URL:** {source.get('url', 'No URL')}")
|
||||
st.markdown(f"**Content Preview:** {source.get('content', 'No content')[:200]}...")
|
||||
st.markdown("---")
|
||||
else:
|
||||
st.info("No research results available")
|
||||
|
||||
# Enable proceeding to the next step if both analysis and research are done
|
||||
if st.session_state.content_analysis and st.session_state.research_results:
|
||||
if st.button("Proceed to Rewrite Settings", type="primary"):
|
||||
st.session_state.current_step = 3
|
||||
st.rerun()
|
||||
|
||||
# Tab 3: Rewrite Settings
|
||||
with tab3:
|
||||
st.header("Rewrite Settings")
|
||||
|
||||
if not st.session_state.original_content or "title" not in st.session_state.original_content:
|
||||
st.info("Please import your blog content first")
|
||||
elif not st.session_state.content_analysis or not st.session_state.research_results:
|
||||
st.info("Please complete content analysis and research first")
|
||||
else:
|
||||
# Create a form for rewrite settings
|
||||
with st.form("rewrite_settings_form"):
|
||||
st.subheader("Content Transformation")
|
||||
|
||||
col1, col2 = st.columns(2)
|
||||
|
||||
with col1:
|
||||
rewrite_mode = st.selectbox(
|
||||
"Rewrite Mode:",
|
||||
list(REWRITE_MODES.keys()),
|
||||
format_func=lambda x: x.replace("_", " ").title(),
|
||||
help="Choose how you want to transform your content"
|
||||
)
|
||||
|
||||
st.info(REWRITE_MODES[rewrite_mode])
|
||||
|
||||
with col2:
|
||||
tone = st.selectbox(
|
||||
"Target Tone:",
|
||||
TONE_OPTIONS,
|
||||
index=0,
|
||||
help="Choose the tone for your rewritten content"
|
||||
)
|
||||
|
||||
st.subheader("Content Length")
|
||||
|
||||
original_word_count = st.session_state.content_analysis.get("metrics", {}).get("word_count", 0)
|
||||
|
||||
length_option = st.radio(
|
||||
"Target Length:",
|
||||
["same", "shorter", "longer", "custom"],
|
||||
format_func=lambda x: {
|
||||
"same": f"Same as original ({original_word_count} words)",
|
||||
"shorter": f"Shorter (about {int(original_word_count * 0.7)} words)",
|
||||
"longer": f"Longer (about {int(original_word_count * 1.3)} words)",
|
||||
"custom": "Custom word count"
|
||||
}[x],
|
||||
horizontal=True
|
||||
)
|
||||
|
||||
if length_option == "custom":
|
||||
target_word_count = st.number_input(
|
||||
"Custom Word Count:",
|
||||
min_value=100,
|
||||
max_value=10000,
|
||||
value=original_word_count,
|
||||
step=100
|
||||
)
|
||||
else:
|
||||
target_word_count = {
|
||||
"same": original_word_count,
|
||||
"shorter": int(original_word_count * 0.7),
|
||||
"longer": int(original_word_count * 1.3)
|
||||
}[length_option]
|
||||
|
||||
st.subheader("SEO Optimization")
|
||||
|
||||
keywords = st.text_input(
|
||||
"Focus Keywords (comma-separated):",
|
||||
placeholder="e.g., digital marketing, SEO, content strategy",
|
||||
help="Enter keywords to optimize your content for"
|
||||
)
|
||||
|
||||
st.subheader("Additional Instructions")
|
||||
|
||||
special_instructions = st.text_area(
|
||||
"Special Instructions (optional):",
|
||||
placeholder="Add any specific instructions for rewriting your content...",
|
||||
help="Provide any additional instructions for the AI"
|
||||
)
|
||||
|
||||
# Submit button
|
||||
submitted = st.form_submit_button("Rewrite Blog", type="primary")
|
||||
|
||||
if submitted:
|
||||
# Process the form data
|
||||
user_preferences = {
|
||||
"rewrite_mode": rewrite_mode,
|
||||
"tone": tone,
|
||||
"target_word_count": target_word_count,
|
||||
"keywords": [k.strip() for k in keywords.split(",")] if keywords else [],
|
||||
"special_instructions": special_instructions
|
||||
}
|
||||
|
||||
with st.spinner("Rewriting your blog..."):
|
||||
# Rewrite the blog
|
||||
st.session_state.rewritten_content = st.session_state.blog_rewriter.rewrite_blog(
|
||||
st.session_state.original_content,
|
||||
user_preferences,
|
||||
st.session_state.research_results,
|
||||
st.session_state.content_analysis
|
||||
)
|
||||
|
||||
if "error" in st.session_state.rewritten_content:
|
||||
st.error(f"Error rewriting blog: {st.session_state.rewritten_content['error']}")
|
||||
else:
|
||||
st.success("Blog rewritten successfully!")
|
||||
st.session_state.current_step = 4
|
||||
st.rerun()
|
||||
|
||||
# Tab 4: Results & Export
|
||||
with tab4:
|
||||
st.header("Results & Export")
|
||||
|
||||
if not st.session_state.rewritten_content or "title" not in st.session_state.rewritten_content:
|
||||
st.info("Please complete the rewriting process first")
|
||||
else:
|
||||
# Display the rewritten content
|
||||
st.subheader("Rewritten Blog")
|
||||
|
||||
# Title and meta description
|
||||
st.markdown(f"## {st.session_state.rewritten_content['title']}")
|
||||
|
||||
if st.session_state.rewritten_content.get("meta_description"):
|
||||
with st.expander("Meta Description", expanded=True):
|
||||
st.text_area(
|
||||
"",
|
||||
st.session_state.rewritten_content["meta_description"],
|
||||
height=80,
|
||||
disabled=True
|
||||
)
|
||||
|
||||
# Create tabs for different views
|
||||
content_tab1, content_tab2 = st.tabs(["Preview", "Markdown"])
|
||||
|
||||
with content_tab1:
|
||||
st.markdown(st.session_state.rewritten_content["content"])
|
||||
|
||||
with content_tab2:
|
||||
st.text_area(
|
||||
"",
|
||||
st.session_state.rewritten_content["content"],
|
||||
height=400
|
||||
)
|
||||
|
||||
# Image generation section
|
||||
st.subheader("Generate Images")
|
||||
|
||||
suggested_images = st.session_state.rewritten_content.get("suggested_images", [])
|
||||
if suggested_images:
|
||||
st.markdown("**Suggested Images:**")
|
||||
|
||||
for i, img in enumerate(suggested_images):
|
||||
with st.expander(f"Image {i+1}: {img.get('description', 'No description')}", expanded=False):
|
||||
st.markdown(f"**Description:** {img.get('description', 'No description')}")
|
||||
st.markdown(f"**Caption:** {img.get('caption', 'No caption')}")
|
||||
st.markdown(f"**Placement:** {img.get('placement', 'No placement specified')}")
|
||||
|
||||
# Generate image button
|
||||
col1, col2 = st.columns([3, 1])
|
||||
|
||||
with col1:
|
||||
image_prompt = st.text_area(
|
||||
"Image Prompt:",
|
||||
value=img.get('description', ''),
|
||||
key=f"image_prompt_{i}"
|
||||
)
|
||||
|
||||
with col2:
|
||||
style = st.selectbox(
|
||||
"Style:",
|
||||
["realistic", "artistic", "cartoon", "3d_render"],
|
||||
key=f"style_{i}"
|
||||
)
|
||||
|
||||
if st.button("Generate Image", key=f"gen_img_{i}"):
|
||||
with st.spinner("Generating image..."):
|
||||
image_path = st.session_state.blog_rewriter.generate_image(image_prompt, style)
|
||||
|
||||
if image_path:
|
||||
# Store the generated image
|
||||
if "generated_images" not in st.session_state:
|
||||
st.session_state.generated_images = {}
|
||||
|
||||
st.session_state.generated_images[f"image_{i}"] = {
|
||||
"path": image_path,
|
||||
"caption": img.get('caption', ''),
|
||||
"placement": img.get('placement', '')
|
||||
}
|
||||
|
||||
st.success("Image generated successfully!")
|
||||
st.rerun()
|
||||
|
||||
# Display the generated image if available
|
||||
if f"image_{i}" in st.session_state.generated_images:
|
||||
st.image(
|
||||
st.session_state.generated_images[f"image_{i}"]["path"],
|
||||
caption=st.session_state.generated_images[f"image_{i}"]["caption"],
|
||||
use_column_width=True
|
||||
)
|
||||
else:
|
||||
st.info("No image suggestions available")
|
||||
|
||||
# Custom image generation
|
||||
with st.expander("Generate Custom Image", expanded=True):
|
||||
col1, col2 = st.columns([3, 1])
|
||||
|
||||
with col1:
|
||||
custom_image_prompt = st.text_area(
|
||||
"Image Prompt:",
|
||||
placeholder="Describe the image you want to generate..."
|
||||
)
|
||||
|
||||
with col2:
|
||||
custom_style = st.selectbox(
|
||||
"Style:",
|
||||
["realistic", "artistic", "cartoon", "3d_render"]
|
||||
)
|
||||
|
||||
if st.button("Generate Custom Image"):
|
||||
if not custom_image_prompt:
|
||||
st.error("Please enter an image prompt")
|
||||
else:
|
||||
with st.spinner("Generating image..."):
|
||||
image_path = st.session_state.blog_rewriter.generate_image(custom_image_prompt, custom_style)
|
||||
|
||||
if image_path:
|
||||
# Store the generated image
|
||||
if "generated_images" not in st.session_state:
|
||||
st.session_state.generated_images = {}
|
||||
|
||||
st.session_state.generated_images["custom_image"] = {
|
||||
"path": image_path,
|
||||
"caption": "Custom generated image",
|
||||
"placement": "Custom placement"
|
||||
}
|
||||
|
||||
st.success("Image generated successfully!")
|
||||
st.rerun()
|
||||
|
||||
# Display the generated custom image if available
|
||||
if "custom_image" in st.session_state.generated_images:
|
||||
st.image(
|
||||
st.session_state.generated_images["custom_image"]["path"],
|
||||
caption=st.session_state.generated_images["custom_image"]["caption"],
|
||||
use_column_width=True
|
||||
)
|
||||
|
||||
# Export options
|
||||
st.subheader("Export Options")
|
||||
|
||||
col1, col2, col3 = st.columns(3)
|
||||
|
||||
with col1:
|
||||
st.download_button(
|
||||
"Download as Markdown",
|
||||
data=st.session_state.rewritten_content["content"],
|
||||
file_name=f"{st.session_state.rewritten_content['title'].replace(' ', '_')}.md",
|
||||
mime="text/markdown"
|
||||
)
|
||||
|
||||
with col2:
|
||||
# Create HTML version
|
||||
html_content = f"""
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<title>{st.session_state.rewritten_content['title']}</title>
|
||||
<meta name="description" content="{st.session_state.rewritten_content.get('meta_description', '')}">
|
||||
<style>
|
||||
body {{ font-family: Arial, sans-serif; line-height: 1.6; max-width: 800px; margin: 0 auto; padding: 20px; }}
|
||||
h1, h2, h3, h4, h5, h6 {{ color: #333; }}
|
||||
img {{ max-width: 100%; height: auto; }}
|
||||
pre {{ background-color: #f5f5f5; padding: 15px; border-radius: 5px; overflow-x: auto; }}
|
||||
blockquote {{ border-left: 5px solid #eee; padding-left: 15px; margin-left: 0; }}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>{st.session_state.rewritten_content['title']}</h1>
|
||||
{st.session_state.rewritten_content['content']}
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
st.download_button(
|
||||
"Download as HTML",
|
||||
data=html_content,
|
||||
file_name=f"{st.session_state.rewritten_content['title'].replace(' ', '_')}.html",
|
||||
mime="text/html"
|
||||
)
|
||||
|
||||
with col3:
|
||||
# Create JSON version with all content and metadata
|
||||
json_content = {
|
||||
"title": st.session_state.rewritten_content["title"],
|
||||
"meta_description": st.session_state.rewritten_content.get("meta_description", ""),
|
||||
"content": st.session_state.rewritten_content["content"],
|
||||
"suggested_images": st.session_state.rewritten_content.get("suggested_images", []),
|
||||
"generated_images": [
|
||||
{
|
||||
"caption": img_data["caption"],
|
||||
"placement": img_data["placement"],
|
||||
"path": img_data["path"]
|
||||
}
|
||||
for img_key, img_data in st.session_state.generated_images.items()
|
||||
] if hasattr(st.session_state, "generated_images") else [],
|
||||
"original_title": st.session_state.original_content.get("title", ""),
|
||||
"original_url": st.session_state.original_content.get("url", ""),
|
||||
"rewrite_date": datetime.now().isoformat()
|
||||
}
|
||||
|
||||
st.download_button(
|
||||
"Download as JSON",
|
||||
data=json.dumps(json_content, indent=2),
|
||||
file_name=f"{st.session_state.rewritten_content['title'].replace(' ', '_')}.json",
|
||||
mime="application/json"
|
||||
)
|
||||
|
||||
# Copy to clipboard buttons
|
||||
st.subheader("Quick Copy")
|
||||
|
||||
col1, col2, col3 = st.columns(3)
|
||||
|
||||
with col1:
|
||||
if st.button("Copy Title", key="copy_title"):
|
||||
st.code(st.session_state.rewritten_content["title"])
|
||||
st.success("Title copied to clipboard!")
|
||||
|
||||
with col2:
|
||||
if st.button("Copy Meta Description", key="copy_meta"):
|
||||
st.code(st.session_state.rewritten_content.get("meta_description", ""))
|
||||
st.success("Meta description copied to clipboard!")
|
||||
|
||||
with col3:
|
||||
if st.button("Copy Full Content", key="copy_content"):
|
||||
st.success("Content copied to clipboard!")
|
||||
|
||||
# Comparison with original
|
||||
with st.expander("Compare with Original", expanded=False):
|
||||
comp_col1, comp_col2 = st.columns(2)
|
||||
|
||||
with comp_col1:
|
||||
st.subheader("Original")
|
||||
st.markdown(f"**Title:** {st.session_state.original_content.get('title', '')}")
|
||||
if st.session_state.original_content.get("meta_description"):
|
||||
st.markdown(f"**Meta Description:** {st.session_state.original_content['meta_description']}")
|
||||
st.text_area(
|
||||
"Original Content",
|
||||
st.session_state.original_content.get("content", ""),
|
||||
height=300,
|
||||
disabled=True
|
||||
)
|
||||
|
||||
with comp_col2:
|
||||
st.subheader("Rewritten")
|
||||
st.markdown(f"**Title:** {st.session_state.rewritten_content['title']}")
|
||||
if st.session_state.rewritten_content.get("meta_description"):
|
||||
st.markdown(f"**Meta Description:** {st.session_state.rewritten_content['meta_description']}")
|
||||
st.text_area(
|
||||
"Rewritten Content",
|
||||
st.session_state.rewritten_content["content"],
|
||||
height=300,
|
||||
disabled=True
|
||||
)
|
||||
|
||||
# Start over button
|
||||
if st.button("Start Over", type="primary"):
|
||||
# Reset session state
|
||||
for key in ["original_content", "content_analysis", "research_results",
|
||||
"rewritten_content", "generated_images", "current_step"]:
|
||||
if key in st.session_state:
|
||||
del st.session_state[key]
|
||||
|
||||
st.rerun()
|
||||
|
||||
if __name__ == "__main__":
|
||||
write_blog_rewriter()
|
||||
595
lib/ai_writers/blog_rewriter_updater/blog_rewriter_utils.py
Normal file
595
lib/ai_writers/blog_rewriter_updater/blog_rewriter_utils.py
Normal file
@@ -0,0 +1,595 @@
|
||||
"""
|
||||
Blog Rewriter Utilities Module
|
||||
|
||||
This module contains the core functionality for rewriting and updating blog content,
|
||||
including content extraction, analysis, research, and rewriting capabilities.
|
||||
"""
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
import re
|
||||
import time
|
||||
import logging
|
||||
from typing import Dict, List, Tuple, Optional, Any
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime
|
||||
|
||||
# Import required modules from the project
|
||||
from ...gpt_providers.text_generation.main_text_generation import llm_text_gen
|
||||
from ...gpt_providers.text_to_image_generation.main_generate_image_from_prompt import generate_image
|
||||
from ...ai_web_researcher.metaphor_basic_neural_web_search import metaphor_search_articles
|
||||
from ...ai_web_researcher.tavily_ai_search import do_tavily_ai_search
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Define constants
|
||||
MAX_TITLE_LENGTH = 70
|
||||
MAX_META_DESCRIPTION_LENGTH = 160
|
||||
REWRITE_MODES = {
|
||||
"standard": "Standard rewrite with improved clarity and flow",
|
||||
"seo_optimization": "Optimize for search engines with targeted keywords",
|
||||
"simplification": "Simplify complex content for broader audience",
|
||||
"expansion": "Expand with additional details and examples",
|
||||
"fact_check": "Focus on fact-checking and updating information",
|
||||
"tone_shift": "Change the tone while preserving content",
|
||||
"modernization": "Update outdated content with current information"
|
||||
}
|
||||
|
||||
# Define tone options
|
||||
TONE_OPTIONS = [
|
||||
"Professional", "Conversational", "Academic", "Enthusiastic",
|
||||
"Authoritative", "Friendly", "Technical", "Inspirational"
|
||||
]
|
||||
|
||||
class BlogRewriter:
|
||||
"""Class to handle blog rewriting functionality."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the BlogRewriter class."""
|
||||
self.original_content = {}
|
||||
self.rewritten_content = {}
|
||||
self.research_results = {}
|
||||
self.content_analysis = {}
|
||||
self.image_suggestions = []
|
||||
|
||||
def extract_content_from_url(self, url: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract content from a given URL.
|
||||
|
||||
Args:
|
||||
url: The URL to extract content from
|
||||
|
||||
Returns:
|
||||
Dictionary containing extracted content
|
||||
"""
|
||||
logger.info(f"Extracting content from URL: {url}")
|
||||
|
||||
try:
|
||||
headers = {
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
|
||||
'Accept-Language': 'en-US,en;q=0.5',
|
||||
'Connection': 'keep-alive',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'Cache-Control': 'max-age=0'
|
||||
}
|
||||
response = requests.get(url, headers=headers, timeout=15)
|
||||
response.raise_for_status()
|
||||
|
||||
soup = BeautifulSoup(response.text, 'html.parser')
|
||||
|
||||
# Extract title
|
||||
title = soup.title.string if soup.title else ""
|
||||
|
||||
# Extract meta description
|
||||
meta_desc = ""
|
||||
meta_tag = soup.find("meta", attrs={"name": "description"})
|
||||
if meta_tag and "content" in meta_tag.attrs:
|
||||
meta_desc = meta_tag["content"]
|
||||
|
||||
# Extract main content - try multiple strategies
|
||||
content = ""
|
||||
|
||||
# Strategy 1: Look for article tag
|
||||
article_tag = soup.find("article")
|
||||
if article_tag:
|
||||
content = article_tag.get_text(separator="\n\n")
|
||||
|
||||
# Strategy 2: Look for main content areas
|
||||
if not content:
|
||||
main_content = soup.find(["main", "div", "section"], class_=re.compile(r"content|article|post|entry|main|body"))
|
||||
if main_content:
|
||||
for elem in main_content.find_all(["nav", "aside", "footer", "comments", "script", "style", "header"]):
|
||||
elem.decompose()
|
||||
content = main_content.get_text(separator="\n\n")
|
||||
|
||||
# Strategy 3: Look for specific content classes
|
||||
if not content:
|
||||
content_classes = ["post-content", "entry-content", "article-content", "blog-content", "content-area"]
|
||||
for class_name in content_classes:
|
||||
content_div = soup.find("div", class_=class_name)
|
||||
if content_div:
|
||||
for elem in content_div.find_all(["nav", "aside", "footer", "comments", "script", "style", "header"]):
|
||||
elem.decompose()
|
||||
content = content_div.get_text(separator="\n\n")
|
||||
break
|
||||
|
||||
# Strategy 4: Look for content within body
|
||||
if not content:
|
||||
body = soup.find("body")
|
||||
if body:
|
||||
# Remove unwanted elements
|
||||
for elem in body.find_all(["nav", "aside", "footer", "comments", "script", "style", "header"]):
|
||||
elem.decompose()
|
||||
content = body.get_text(separator="\n\n")
|
||||
|
||||
# Clean up the content
|
||||
content = re.sub(r'\n{3,}', '\n\n', content)
|
||||
content = re.sub(r'\s{2,}', ' ', content)
|
||||
content = content.strip()
|
||||
|
||||
# Extract headings with their hierarchy
|
||||
headings = []
|
||||
for h in soup.find_all(["h1", "h2", "h3", "h4", "h5", "h6"]):
|
||||
headings.append({
|
||||
"level": int(h.name[1]),
|
||||
"text": h.get_text().strip()
|
||||
})
|
||||
|
||||
# Extract images with more metadata
|
||||
images = []
|
||||
for img in soup.find_all("img"):
|
||||
if img.get("src") and not img.get("src").startswith("data:"):
|
||||
image_url = img.get("src")
|
||||
if not image_url.startswith(("http://", "https://")):
|
||||
base_url = "/".join(url.split("/")[:3])
|
||||
image_url = f"{base_url}/{image_url.lstrip('/')}"
|
||||
|
||||
images.append({
|
||||
"url": image_url,
|
||||
"alt_text": img.get("alt", ""),
|
||||
"title": img.get("title", ""),
|
||||
"class": img.get("class", []),
|
||||
"width": img.get("width"),
|
||||
"height": img.get("height")
|
||||
})
|
||||
|
||||
# Extract publish date with multiple strategies
|
||||
publish_date = None
|
||||
# Try meta tags first
|
||||
date_meta = soup.find("meta", attrs={"property": "article:published_time"})
|
||||
if date_meta and "content" in date_meta.attrs:
|
||||
publish_date = date_meta["content"]
|
||||
else:
|
||||
# Try other meta tags
|
||||
for prop in ["datePublished", "dateCreated", "dateModified"]:
|
||||
date_meta = soup.find("meta", attrs={"property": prop})
|
||||
if date_meta and "content" in date_meta.attrs:
|
||||
publish_date = date_meta["content"]
|
||||
break
|
||||
|
||||
# Try HTML elements if meta tags failed
|
||||
if not publish_date:
|
||||
date_elem = soup.find(["time", "span", "div"], class_=re.compile(r"date|time|publish|posted|created"))
|
||||
if date_elem and date_elem.get_text():
|
||||
publish_date = date_elem.get_text().strip()
|
||||
|
||||
# Extract author with multiple strategies
|
||||
author = None
|
||||
# Try meta tags first
|
||||
author_meta = soup.find("meta", attrs={"name": "author"})
|
||||
if author_meta and "content" in author_meta.attrs:
|
||||
author = author_meta["content"]
|
||||
else:
|
||||
# Try other meta tags
|
||||
for prop in ["article:author", "author"]:
|
||||
author_meta = soup.find("meta", attrs={"property": prop})
|
||||
if author_meta and "content" in author_meta.attrs:
|
||||
author = author_meta["content"]
|
||||
break
|
||||
|
||||
# Try HTML elements if meta tags failed
|
||||
if not author:
|
||||
author_elem = soup.find(["a", "span", "div"], class_=re.compile(r"author|byline|writer|posted-by"))
|
||||
if author_elem and author_elem.get_text():
|
||||
author = author_elem.get_text().strip()
|
||||
|
||||
# Log content extraction results
|
||||
logger.info(f"Extracted content length: {len(content)} characters")
|
||||
logger.info(f"Found {len(headings)} headings")
|
||||
logger.info(f"Found {len(images)} images")
|
||||
logger.info(f"Publish date: {publish_date}")
|
||||
logger.info(f"Author: {author}")
|
||||
|
||||
return {
|
||||
"title": title,
|
||||
"meta_description": meta_desc,
|
||||
"content": content,
|
||||
"headings": headings,
|
||||
"images": images,
|
||||
"publish_date": publish_date,
|
||||
"author": author,
|
||||
"url": url
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error extracting content from URL: {e}")
|
||||
return {
|
||||
"title": "",
|
||||
"meta_description": "",
|
||||
"content": "",
|
||||
"headings": [],
|
||||
"images": [],
|
||||
"publish_date": None,
|
||||
"author": None,
|
||||
"url": url,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
def analyze_content(self, content: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Analyze the extracted content to provide insights.
|
||||
|
||||
Args:
|
||||
content: Dictionary containing extracted content
|
||||
|
||||
Returns:
|
||||
Dictionary containing content analysis
|
||||
"""
|
||||
logger.info("Analyzing content")
|
||||
|
||||
analysis = {}
|
||||
|
||||
# Basic metrics
|
||||
text_content = content.get("content", "")
|
||||
word_count = len(text_content.split())
|
||||
sentence_count = len(re.split(r'[.!?]+', text_content))
|
||||
paragraph_count = len(re.split(r'\n\n+', text_content))
|
||||
|
||||
analysis["metrics"] = {
|
||||
"word_count": word_count,
|
||||
"sentence_count": sentence_count,
|
||||
"paragraph_count": paragraph_count,
|
||||
"avg_words_per_sentence": round(word_count / max(sentence_count, 1), 1),
|
||||
"avg_sentences_per_paragraph": round(sentence_count / max(paragraph_count, 1), 1)
|
||||
}
|
||||
|
||||
# Heading structure analysis
|
||||
headings = content.get("headings", [])
|
||||
heading_structure = {}
|
||||
for h in headings:
|
||||
level = h["level"]
|
||||
if level not in heading_structure:
|
||||
heading_structure[level] = 0
|
||||
heading_structure[level] += 1
|
||||
|
||||
analysis["heading_structure"] = heading_structure
|
||||
|
||||
# Content age analysis
|
||||
publish_date = content.get("publish_date")
|
||||
if publish_date:
|
||||
try:
|
||||
if "T" in publish_date:
|
||||
pub_date = datetime.fromisoformat(publish_date.replace("Z", "+00:00"))
|
||||
else:
|
||||
date_formats = [
|
||||
"%Y-%m-%d", "%d-%m-%Y", "%B %d, %Y", "%b %d, %Y",
|
||||
"%d %B %Y", "%d %b %Y", "%Y/%m/%d", "%d/%m/%Y"
|
||||
]
|
||||
for fmt in date_formats:
|
||||
try:
|
||||
pub_date = datetime.strptime(publish_date, fmt)
|
||||
break
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
now = datetime.now()
|
||||
age_days = (now - pub_date).days
|
||||
analysis["content_age"] = {
|
||||
"days": age_days,
|
||||
"months": round(age_days / 30, 1),
|
||||
"years": round(age_days / 365, 1)
|
||||
}
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not parse publish date: {e}")
|
||||
analysis["content_age"] = {"error": "Could not determine content age"}
|
||||
else:
|
||||
analysis["content_age"] = {"error": "No publish date found"}
|
||||
|
||||
# Image analysis
|
||||
images = content.get("images", [])
|
||||
analysis["images"] = {
|
||||
"count": len(images),
|
||||
"with_alt_text": sum(1 for img in images if img.get("alt_text"))
|
||||
}
|
||||
|
||||
return analysis
|
||||
|
||||
def conduct_research(self, title: str, content: str, research_depth: str = "medium") -> Dict[str, Any]:
|
||||
"""
|
||||
Conduct web research to find updated information related to the blog content.
|
||||
|
||||
Args:
|
||||
title: Blog title
|
||||
content: Blog content
|
||||
research_depth: Depth of research (low, medium, high)
|
||||
|
||||
Returns:
|
||||
Dictionary containing research results
|
||||
"""
|
||||
logger.info(f"Conducting research with depth: {research_depth}")
|
||||
|
||||
# Extract key topics from the content
|
||||
prompt = f"""
|
||||
Extract 3-5 key topics or claims from this blog content that might need fact-checking or updating.
|
||||
For each topic, provide a concise search query that would help find the most recent information.
|
||||
|
||||
Blog title: {title}
|
||||
|
||||
First 1000 characters of content:
|
||||
{content[:1000]}...
|
||||
|
||||
Format your response as a JSON array of objects with 'topic' and 'query' fields.
|
||||
"""
|
||||
|
||||
try:
|
||||
topics_json = llm_text_gen(prompt)
|
||||
topics_json = re.search(r'\[.*\]', topics_json, re.DOTALL)
|
||||
if topics_json:
|
||||
topics = json.loads(topics_json.group(0))
|
||||
else:
|
||||
topics = [
|
||||
{"topic": title, "query": title + " latest information"},
|
||||
{"topic": "Updates on " + title, "query": title + " recent developments"}
|
||||
]
|
||||
except Exception as e:
|
||||
logger.error(f"Error extracting topics: {e}")
|
||||
topics = [
|
||||
{"topic": title, "query": title + " latest information"},
|
||||
{"topic": "Updates on " + title, "query": title + " recent developments"}
|
||||
]
|
||||
|
||||
# Determine number of results based on research depth
|
||||
num_results = {"low": 2, "medium": 3, "high": 5}.get(research_depth, 3)
|
||||
|
||||
research_results = {"topics": []}
|
||||
|
||||
# Conduct research for each topic
|
||||
for topic in topics[:3]: # Limit to 3 topics
|
||||
topic_results = {"topic": topic["topic"], "sources": []}
|
||||
|
||||
# Try Exa search first
|
||||
try:
|
||||
exa_results = metaphor_search_articles(topic["query"], num_results=num_results)
|
||||
if exa_results:
|
||||
topic_results["sources"].extend(exa_results)
|
||||
except Exception as e:
|
||||
logger.warning(f"Exa search failed: {e}")
|
||||
|
||||
# If Exa didn't return enough results, try Tavily
|
||||
if len(topic_results["sources"]) < num_results:
|
||||
try:
|
||||
tavily_results = do_tavily_ai_search(topic["query"], num_results=num_results)
|
||||
if tavily_results:
|
||||
existing_urls = [s["url"] for s in topic_results["sources"]]
|
||||
for result in tavily_results:
|
||||
if result["url"] not in existing_urls:
|
||||
topic_results["sources"].append(result)
|
||||
existing_urls.append(result["url"])
|
||||
except Exception as e:
|
||||
logger.warning(f"Tavily search failed: {e}")
|
||||
|
||||
research_results["topics"].append(topic_results)
|
||||
|
||||
return research_results
|
||||
|
||||
def generate_rewrite_prompt(self, original_content: Dict[str, Any],
|
||||
user_preferences: Dict[str, Any],
|
||||
research_results: Dict[str, Any],
|
||||
content_analysis: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Generate a prompt for the LLM to rewrite the blog.
|
||||
|
||||
Args:
|
||||
original_content: Original blog content
|
||||
user_preferences: User preferences for rewriting
|
||||
research_results: Research results for updating content
|
||||
content_analysis: Analysis of the original content
|
||||
|
||||
Returns:
|
||||
Prompt string for the LLM
|
||||
"""
|
||||
logger.info("Generating rewrite prompt")
|
||||
|
||||
# Extract key information
|
||||
title = original_content.get("title", "")
|
||||
content = original_content.get("content", "")
|
||||
|
||||
# Truncate content if it's too long
|
||||
max_content_length = 6000
|
||||
if len(content) > max_content_length:
|
||||
content_preview = content[:max_content_length] + "...\n[Content truncated due to length]"
|
||||
else:
|
||||
content_preview = content
|
||||
|
||||
# Format research results
|
||||
research_summary = ""
|
||||
for topic in research_results.get("topics", []):
|
||||
research_summary += f"\n## {topic['topic']}\n"
|
||||
for i, source in enumerate(topic.get("sources", [])[:3]):
|
||||
research_summary += f"Source {i+1}: {source.get('title', 'Untitled')}\n"
|
||||
research_summary += f"URL: {source.get('url', 'No URL')}\n"
|
||||
research_summary += f"Content: {source.get('content', 'No content')[:300]}...\n\n"
|
||||
|
||||
# Build the prompt
|
||||
prompt = f"""
|
||||
# Blog Rewriting Task
|
||||
|
||||
## Original Blog Information
|
||||
Title: {title}
|
||||
Word Count: {content_analysis.get('metrics', {}).get('word_count', 'Unknown')}
|
||||
Estimated Age: {content_analysis.get('content_age', {}).get('months', 'Unknown')} months
|
||||
|
||||
## Rewriting Instructions
|
||||
Mode: {user_preferences.get('rewrite_mode', 'standard')}
|
||||
Target Tone: {user_preferences.get('tone', 'Professional')}
|
||||
Target Word Count: {user_preferences.get('target_word_count', 'Same as original')}
|
||||
Focus Keywords: {', '.join(user_preferences.get('keywords', []))}
|
||||
|
||||
## Special Instructions
|
||||
{user_preferences.get('special_instructions', 'No special instructions')}
|
||||
|
||||
## Recent Research Findings
|
||||
{research_summary if research_summary else "No research results available."}
|
||||
|
||||
## Original Content
|
||||
{content_preview}
|
||||
|
||||
## Your Task
|
||||
Please rewrite this blog post according to the instructions above. The rewritten blog should:
|
||||
|
||||
1. Maintain the core message and value of the original content
|
||||
2. Update any outdated information based on the research findings
|
||||
3. Adopt the requested tone and style
|
||||
4. Incorporate the focus keywords naturally
|
||||
5. Improve readability and engagement
|
||||
6. Maintain a logical structure with appropriate headings
|
||||
7. Include a compelling introduction and conclusion
|
||||
|
||||
## Output Format
|
||||
Please provide your response in the following JSON format:
|
||||
```json
|
||||
{{
|
||||
"title": "Rewritten title",
|
||||
"meta_description": "SEO-optimized meta description (max 160 characters)",
|
||||
"content": "Full rewritten content with proper markdown formatting",
|
||||
"suggested_images": [
|
||||
{{
|
||||
"description": "Brief description of a suggested image",
|
||||
"caption": "Suggested caption for the image",
|
||||
"placement": "Where this image should be placed (e.g., 'After introduction', 'Before conclusion')"
|
||||
}}
|
||||
]
|
||||
}}
|
||||
```
|
||||
|
||||
Ensure the JSON is properly formatted and valid.
|
||||
"""
|
||||
|
||||
return prompt
|
||||
|
||||
def rewrite_blog(self, original_content: Dict[str, Any],
|
||||
user_preferences: Dict[str, Any],
|
||||
research_results: Dict[str, Any],
|
||||
content_analysis: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Rewrite the blog based on original content, user preferences, and research.
|
||||
|
||||
Args:
|
||||
original_content: Original blog content
|
||||
user_preferences: User preferences for rewriting
|
||||
research_results: Research results for updating content
|
||||
content_analysis: Analysis of the original content
|
||||
|
||||
Returns:
|
||||
Dictionary containing rewritten content
|
||||
"""
|
||||
logger.info("Rewriting blog content")
|
||||
|
||||
# Generate the prompt
|
||||
prompt = self.generate_rewrite_prompt(
|
||||
original_content, user_preferences, research_results, content_analysis
|
||||
)
|
||||
|
||||
# Call the LLM to rewrite the content
|
||||
try:
|
||||
response = llm_text_gen(prompt)
|
||||
|
||||
# Clean the response of any invalid control characters
|
||||
response = ''.join(char for char in response if ord(char) >= 32 or char in '\n\r\t')
|
||||
|
||||
# Extract JSON from the response
|
||||
json_match = re.search(r'```json\s*(.*?)\s*```', response, re.DOTALL)
|
||||
if json_match:
|
||||
json_str = json_match.group(1)
|
||||
else:
|
||||
# If no JSON block found, try to find JSON-like content
|
||||
json_match = re.search(r'\{.*\}', response, re.DOTALL)
|
||||
if json_match:
|
||||
json_str = json_match.group(0)
|
||||
else:
|
||||
json_str = response
|
||||
|
||||
# Clean up the JSON string
|
||||
json_str = re.sub(r'```(json)?', '', json_str).strip()
|
||||
|
||||
# Remove any remaining invalid control characters
|
||||
json_str = ''.join(char for char in json_str if ord(char) >= 32 or char in '\n\r\t')
|
||||
|
||||
# Parse the JSON with error handling
|
||||
try:
|
||||
rewritten_content = json.loads(json_str)
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"JSON parsing error: {e}")
|
||||
# Try to fix common JSON issues
|
||||
json_str = json_str.replace('\\n', '\\\\n') # Fix escaped newlines
|
||||
json_str = json_str.replace('\\"', '"') # Fix escaped quotes
|
||||
json_str = json_str.replace('\\t', '\\\\t') # Fix escaped tabs
|
||||
rewritten_content = json.loads(json_str)
|
||||
|
||||
# Validate the response structure
|
||||
required_fields = ["title", "meta_description", "content"]
|
||||
for field in required_fields:
|
||||
if field not in rewritten_content:
|
||||
rewritten_content[field] = original_content.get(field, "")
|
||||
logger.warning(f"Missing required field '{field}' in rewritten content")
|
||||
|
||||
# Ensure suggested_images exists
|
||||
if "suggested_images" not in rewritten_content:
|
||||
rewritten_content["suggested_images"] = []
|
||||
|
||||
# Clean up the content field
|
||||
if "content" in rewritten_content:
|
||||
# Remove any remaining invalid control characters
|
||||
rewritten_content["content"] = ''.join(
|
||||
char for char in rewritten_content["content"]
|
||||
if ord(char) >= 32 or char in '\n\r\t'
|
||||
)
|
||||
# Normalize whitespace
|
||||
rewritten_content["content"] = re.sub(r'\s+', ' ', rewritten_content["content"])
|
||||
rewritten_content["content"] = re.sub(r'\n{3,}', '\n\n', rewritten_content["content"])
|
||||
|
||||
return rewritten_content
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error rewriting blog: {e}")
|
||||
return {
|
||||
"title": original_content.get("title", ""),
|
||||
"meta_description": original_content.get("meta_description", ""),
|
||||
"content": original_content.get("content", ""),
|
||||
"suggested_images": [],
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
def generate_image(self, image_prompt: str, style: str = "realistic") -> str:
|
||||
"""
|
||||
Generate an image based on the prompt.
|
||||
|
||||
Args:
|
||||
image_prompt: Prompt for image generation
|
||||
style: Style of the image
|
||||
|
||||
Returns:
|
||||
Path to the generated image
|
||||
"""
|
||||
logger.info(f"Generating image with prompt: {image_prompt}")
|
||||
|
||||
try:
|
||||
image_path = generate_image(image_prompt, style=style)
|
||||
return image_path
|
||||
except Exception as e:
|
||||
logger.error(f"Error generating image: {e}")
|
||||
return ""
|
||||
Reference in New Issue
Block a user