diff --git a/.gitignore b/.gitignore index 4e8cad2d..337e506a 100644 --- a/.gitignore +++ b/.gitignore @@ -3,24 +3,17 @@ __pycache__ pseo-experiemnts/ *.swp venv/ - *.pyc __pycache__/ - instance/ - .pytest_cache/ .coverage htmlcov/ - dist/ build/ *.egg-info/ - pseo-experiments/lib/python3.10/ - pseo-experiments/bin/ - blog_images/ - +blogs/ pseo_website/ diff --git a/LICENSE b/LICENSE deleted file mode 100644 index a6f6c9fe..00000000 --- a/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2021 Cotes Chung - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/README.md b/README.md index d7f5ec4a..8216eac9 100644 --- a/README.md +++ b/README.md @@ -1,25 +1,43 @@ +# AI Blog Creation and Management Toolkit + ## Introduction +This toolkit automates and enhances the process of blog creation, optimization, and management. Leveraging AI technologies, it assists content creators and digital marketers in generating, formatting, and uploading blog content efficiently. The toolkit integrates advanced AI models for text generation, image creation, and data analysis, streamlining the content creation pipeline. -Given high level domain keywords like "Fishing baits online" Or any 2-3 main key words that describe, broadly, your business. -This tool will produce a SEO optimized blogs. This tool will suggest most popular blog topics, divide them in sub topics and write content for each sub topic. For each of the paragraphs, we summarise it and pass the line for text to image. -Thus, the generated blog will have text and relevant images. +## Features -(TBD) Provide the blog output as plain text, markdown Or HTML. +### Blog Generation and Optimization +- **YouTube to Blog Conversion**: Converts YouTube videos into detailed blog posts by extracting and transcribing audio, then generating text-based content. +- **Online Research Integration**: Enhances blog content by integrating insights and information gathered from online research, ensuring the content is informative and up-to-date. +- **Image Generation and Processing**: Utilizes AI models like DALL-E 3 to create relevant images based on blog content. Offers features to process and optimize images for web usage. +- **SEO Optimization**: Employs AI to generate SEO-friendly blog titles, meta descriptions, tags, and categories. Ensures content is optimized for search engines. -Presently, wordpress and WIX integration is present for uploading the generated blog, but needs testing. +### Speech-to-Text Conversion +- **Audio Transcription**: Converts speech from video content into text, facilitating the creation of blogs and articles from video sources. -### This is based on openai gpt models for content generation, google bard for keyword research and some basic tools for plagiarism checker, SEO audit and suggestions to improve the generated content. -As prompts are the important ingredients to get the best result, they are stored in prompts folder. Edit these prompts to produce results as per your likings. +### AI-Driven Content Creation +- **Text Generation with OpenAI ChatGPT**: Leverages OpenAI's ChatGPT for generating creative and relevant text for blogs. +- **Customizable AI Parameters**: Offers flexibility in adjusting AI parameters like model selection, temperature, and token limits to suit different content needs. -- API based blog generation are much cheaper, almost 10x, but difficult to use for everyone. We use bard for search related prompts and chatgpt for generative requirements. -- Jekyll static website is included for checking how blogs will look locally. - -### Check TBD for features currently under development. +### Image Detail Extraction +- **Analyzing and Extracting Image Details**: Uses OpenAI's Vision API to analyze images and extract details such as alt text, descriptions, titles, and captions, enhancing the SEO of image content. -- Amazon affiliate links are also supported. Given, affiliate tag, your affiliate product links will included in the blogs. -To use the module, simply create an instance of the AmazonAffiliateImages class, passing in your Amazon affiliate tag. -Then, you can use the get_image_url() or get_image_html() methods to get the Amazon affiliate image URL or HTML -for a product, passing in either the product ASIN or the product URL. + +## Installation and Configuration +1. **Clone the Repository**: Clone the toolkit from the provided repository link. +2. **Install Dependencies**: Install necessary Python packages and libraries. + + +## Usage +The toolkit provides functions for generating blogs from YouTube videos, detailed blogs from provided keywords, and optimizing them for SEO and readability. It supports content uploading to WordPress and includes comprehensive error handling. + +## Installation +- Requires Python 3.x. +- Install dependencies: `openai`, `nltk`, `tqdm`, `loguru`. +- Set up API keys and credentials for OpenAI and WordPress. + +--- + +**Note**: This toolkit is designed for automated blog management and requires appropriate API keys and access credentials for full functionality. ---------------------------------- @@ -45,30 +63,6 @@ options: - Example: python3 pseo_main.py --num_blogs "10" --keywords "Python, programming, data science" --niche True -- Output is directly written as blog post for pseo_website and can found in pseo_website/_posts directory -- Note: Follow instructions here to install jekyll : https://jekyllrb.com/docs/installation/ -- Read this: https://chirpy.cotes.page/posts/write-a-new-post/ - ----------------------------------- - -The generated blogs are present in generated_blogs folder. Presently, the blog template is rigid and follows the -below pattern: -[Blog Title] -[Introduction of n chars] -[Body] -[Body][topic][content of n chars on sub-topic] -[Conclusion] - -TBD: More templates and an easy way to change prompts are in pipeline. - ------------------------------------ - -PSEO Website: - -We are using jekyll static page generator and chirpy theme for it. -This is easy enough to programmatically control and publish content. - -Checkout pseo_website directory for details. ----------------------------------- diff --git a/TBD b/TBD deleted file mode 100644 index 701a7eee..00000000 --- a/TBD +++ /dev/null @@ -1,13 +0,0 @@ -1). https://github.com/hardikvasa/google-images-download -2). imagen from google -3). dalle-3 -4). Bing images - -5). Include gpt researcher : https://python.langchain.com/docs/use_cases/web_scraping - -6). We need memory to store blogs posts and not repeat them. -Have a database, or query from web hosting, of all the blogs present. - -7). Incorporate prompttemplate. - -8). Integrate website support for flask, django, wordpress, wix etc diff --git a/lib/blog_proof_reader.py b/lib/blog_proof_reader.py new file mode 100644 index 00000000..f806e285 --- /dev/null +++ b/lib/blog_proof_reader.py @@ -0,0 +1,33 @@ +def blog_proof_editor(blog_content, blog_keywords): + """ + Helper for blog proof reading. + """ + prompt = """I am looking for detailed editing and enhancement of the given blog post, + with a particular focus on maintaining originality. + The topic of the content is [{blog_keywords}]. Please go through the blog and make direct edits to improve it, + ensuring the final output is both high-quality and original. + Note: There are duplicates headings and corresponding paragraphs, rewrite into one subheading. + + Here are the specific areas to focus on: + + 1). Ensure Originality: Edit any sections that lack originality, replacing them with unique and creative content. + 2). Eliminate Repetitive Language: Rewrite repetitive phrases with varied and engaging language. + 3). Vocabulary and Grammar Enhancement: Directly correct any grammatical errors and upgrade the + vocabulary for better readability. + 4). Improve Sentence Structure: Enhance sentence construction for better clarity and flow. + 5). Tone and Brand Alignment: Adjust the tone, voice, personality of given content to make it unique. + 6). Optimize Content Structure: Reorganize the content for a more impactful presentation, + including better paragraphing and transitions. + 7). Remove Redundancies: Important, Cut out any redundant information or overly complex jargon. + 8). Refine Overall Structure: Make structural changes to improve the overall impact of the content. + 9). Remember, rewrite all content that repeated, while maintaining the formatting of the given blog text. + + Please apply these changes directly to the following blog text and provide the edited version: + [blog_content]. """ + + try: + # TBD: Add logic for which_provider and which_model + response = openai_chatgpt(prompt) + return response + except Exception as err: + SystemError(f"Error Blog Proof Reading: {err}") diff --git a/lib/combine_research_and_blog.py b/lib/combine_research_and_blog.py new file mode 100644 index 00000000..18febd75 --- /dev/null +++ b/lib/combine_research_and_blog.py @@ -0,0 +1,40 @@ +def blog_with_research(report, blog): + """Combine the given online research and gpt blog content""" + + prompt = f""" + You are an expert copywriter specializing in content optimization for SEO. + I will provide you with a research report and a blog content on the same topic. + Treat the research report as the context for the blog and better it accordingly. + Your task is to transform and combine the given research and blog content into a well-structured, unique + and engaging blog article. + Your objectives include: + 1. Master the report and blog content: Understand main ideas, key points, and the core message. + 2. Sentence Structure: Rephrase while preserving logical flow and coherence. + 3. Identify Main Keyword: Determine the primary topic and combine the articles on the main topic. + 4. Keyword Integration: Naturally integrate keywords in headings, subheadings, and body text, avoiding overuse. + 5. Write Unique Content: Avoid direct copying from given report and blog; rewrite in your own words and style. + 6. Optimize for SEO: Generate high quality informative content. + Implement SEO best practises with appropriate keyword density. + 7. Craft Engaging and Informative Article: Provide value and insight to readers. + 8. Proofread: Important to Check for grammar, spelling, and punctuation errors. + 9. Use Creative and Human-like Style: Incorporate contractions, idioms, transitional phrases, + interjections, and colloquialisms. Avoid repetitive phrases and unnatural sentence structures. + 10. Structuring: Include an Introduction, subtopics and use bullet points or + numbered lists if appropriate. Important to include FAQs, and Conclusion. + 11. Ensure Uniqueness: Guarantee the article is plagiarism-free. Write in unique, informative style. + 12. Punctuation: Use appropriate question marks at the end of questions. + 13. Pass AI Detection Tools: Create content that easily passes AI plagiarism detection tools. + 14. REMEMBER to give final response as complete HTML. + Follow these guidelines to create a well-optimized, unique, and informative article + that will rank well in search engine results and engage readers effectively. + + Create a blog post from the given research report and blog content below. + Research report: {report} + Blog content: {blog} + """ + try: + # TBD: Add logic for which_provider and which_model + response = openai_chatgpt(prompt) + return response + except Exception as err: + SystemError(f"Error in combining research report and blog content.") diff --git a/lib/config.json b/lib/config.json new file mode 100644 index 00000000..223ea29e --- /dev/null +++ b/lib/config.json @@ -0,0 +1,8 @@ +{ + "wordpress_url": "https://latestaitools.in/", + "wordpress_username": "username", + "wordpress_password": "password", + "image_dir": "path/to/image_dir", + "output_path": "path/to/output_path" +} + diff --git a/lib/convert_content_to_markdown.py b/lib/convert_content_to_markdown.py new file mode 100644 index 00000000..c6d67dc8 --- /dev/null +++ b/lib/convert_content_to_markdown.py @@ -0,0 +1,27 @@ +def convert_tomarkdown_format(blog_content): + """ Helper for converting content to markdown format for static sites. """ + prompt = f""" + As an expert in markdown language format and font matter, used for static webpages. + Your task is to convert and improve formatting of given blog content. + Do Not modify the content, only modify to convert it into highly readable blog content. + + Use below guidelines and include other best practises: + 1). Headers for Structure: Use # for main headings and increase the number of # for + subheadings (##, ###, etc.). Organize given content into clear, hierarchical sections. + 2). Emphasizing Text: Use single asterisks or underscores for italic (*italic* or _italic_), + double for bold (**bold** or __bold__), and triple for bold italic (***bold italic***). + 3). Lists: For unordered lists, use dashes, asterisks, or plus signs (-, *, +). + For ordered lists, use numbers followed by periods (1., 2., etc.). + 4). Blockquotes: Use > for blockquotes, and add additional > for nested blockquotes. + 5). Code Blocks: Use backticks for inline code (code) and triple backticks for code blocks. + Specify a language for syntax highlighting. + 6). Horizontal Lines: Create a horizontal line using three or more asterisks, dashes, or underscores (---, ***). + 7). Table Formatting: Use pipes | and dashes - to create tables. Align text with colons. + + Convert the given blog content in well organised markdown content: {blog_content}""" + try: + # TBD: Add logic for which_provider and which_model + response = openai_chatgpt(prompt) + return response + except Exception as err: + SystemError(f"Error in converting to Markdown format.") diff --git a/lib/convert_markdown_to_html.py b/lib/convert_markdown_to_html.py new file mode 100644 index 00000000..403c0cb6 --- /dev/null +++ b/lib/convert_markdown_to_html.py @@ -0,0 +1,35 @@ +def convert_markdown_to_html(md_content): + """ Helper function to convert given text to HTML + """ + prompt =f""" + You are a skilled web developer tasked with converting a Markdown-formatted text to HTML. + You will be given text in markdown format. Follow these steps to perform the conversion: + + 1. Parse User's Markdown Input: You will receive a Markdown-formatted text as input from the user. + Carefully analyze the provided Markdown text, paying attention to different elements such as headings (#), + lists (unordered and ordered), bold and italic text, links, images, and code blocks. + 2. Generate and Validate HTML: Generate corresponding HTML code for each Markdown element following + the conversion guidelines below. Ensure the generated HTML is well-structured and syntactically correct. + 3. Preserve Line Breaks: Markdown line breaks (soft breaks) represented by two spaces at the end of a + line should be converted to
tags in HTML to preserve the line breaks. + 4. REMEMBER to generate complete, valid HTML response only. + + Follow below Conversion Guidelines: + - Headers: Convert Markdown headers (#, ##, ###, etc.) to corresponding HTML header tags (

,

,

, etc.). + - Lists: Convert unordered lists (*) and ordered lists (1., 2., 3., etc.) to