WIP000.1- AI content writer

This commit is contained in:
AjaySi
2024-01-03 16:59:17 +05:30
parent 8f89de7b69
commit b51e9a8c2f
32 changed files with 854 additions and 506 deletions

View File

@@ -0,0 +1,47 @@
## Prompting settings for LLMs
**Model:** Choose which LLM model to use. Different models have different strengths and weaknesses. For example, some models are better at generating creative text formats, while others are better at answering questions in an informative way.
**Temperature:** The temperature setting controls how creative and varied the text the LLM generates is. A higher temperature setting will result in more creative and varied text, but it may also be less accurate. A lower temperature setting will result in more accurate text, but it may also be less creative and varied.
**Top P:** The top P setting controls how likely the LLM is to generate the most likely words. A higher top P setting will result in more predictable and grammatically correct text, but it may also be less interesting. A lower top P setting will result in more interesting and creative text, but it may also be less predictable and grammatically correct.
**Frequency penalty:** The frequency penalty setting controls how likely the LLM is to generate words that are common in the text it is trained on. A higher frequency penalty setting will result in less repetitive text, but it may also be less fluent. A lower frequency penalty setting will result in more fluent text, but it may also be more repetitive.
**Presence penalty:** The presence penalty setting controls how likely the LLM is to generate words that are already present in the prompt. A higher presence penalty setting will result in more creative text, but it may also be less relevant to the prompt. A lower presence penalty setting will result in more relevant text, but it may also be less creative.
These settings can help you get better results from LLMs, depending on what you are trying to do. For example, if you are trying to generate a creative poem, you might want to use a higher temperature setting and a lower presence penalty setting. If you are trying to get an answer to a technical question, you might want to use a lower temperature setting and a higher presence penalty setting.
Experiment with the different settings to see what works best for you.
## Components of a prompt
**Instruction:** The specific task or instruction that you want the LLM to perform. For example, you might want it to generate a poem, translate a sentence, or answer a question.
**Context:** External information or additional context that can help the LLM to better understand your request and generate a better response. For example, if you are asking the LLM to generate a poem, you might provide it with the topic of the poem or the style of poem that you want.
**Input data:** The input or question that you are asking the LLM to respond to. For example, if you are asking the LLM to translate a sentence, you would provide it with the sentence that you want translated.
**Output indicator:** The type or format of the output that you want the LLM to generate. For example, you might want the LLM to generate a poem, translate a sentence, or answer a question in a specific format.
You don't need to include all four of these components in a prompt, but the more information you can provide to the LLM, the better it will be able to understand your request and generate a good response.
## Tips for prompting large language models
**Be clear and concise in your prompt.** The LLM should be able to understand exactly what you are asking it to do.
**Provide context for your prompt.** This can help the LLM to generate a better response. For example, if you are asking the LLM to write a poem, you might provide it with the topic of the poem or the style of poem that you want.
**Break down complex tasks into smaller steps.** This can help the LLM to better understand what you are asking it to do and to generate a more accurate response.
**Use examples to illustrate your prompt.** This can help the LLM to understand what you are asking for and to generate a more relevant response.
**Test your prompts on different LLMs.** Different LLMs have different strengths and weaknesses, so some prompts may work better with certain LLMs than others.
**Additional tips:**
* **Use the right model.** Different LLMs are better at different tasks. For example, some LLMs are better at generating creative text formats, while others are better at answering questions in an informative way.
* **Tune the prompting settings.** There are a number of prompting settings that you can adjust to get better results from LLMs. For example, you can adjust the temperature setting to control how creative the LLM is, or the top P setting to control how likely the LLM is to generate the most likely words.
* **Experiment with different prompts.** The best way to find out what works for you is to experiment with different prompts. Try different ways of phrasing your prompt and see what works best.

View File

@@ -0,0 +1,67 @@
#####################################################
# Act as travel guide:
#####################################################
want you to act as a travel guide. I will write you my location and you will suggest a place to visit near my location. In some cases, I will also give you the type of places I will visit. You will also suggest me places of similar type that are close to my first location. My first suggestion request is {}
#####################################################
#
# Use below prompts to generate Idea or topics, titles to write on.
#
#####################################################
# This is basically keyword research for a specific domain, narrowed down by blog topics.
# We can craft prompts to get an idea on what to generate blogs on.
# Divide them in topic and write for most searched ones, as below:
When using GPT to generate content, it is important to provide it with clear and concise instructions.
For example, if you are asking GPT to generate a blog post outline, you should provide it with the following information:
- Topic: What is the topic of the blog post?
- Audience: Who is the target audience for the blog post?
- Purpose: What is the purpose of the blog post? (To inform, entertain, sell, etc.)
- Keywords: What keywords do you want the blog post to rank for?
-------------------------------------------------------------------
Generate a list of the top {X} most popular and semantically related keywords and entities for the topic of {X}, categorized by search intent (informational, commercial, transactional).
Generate a list of the top {X} most popular and semantically related long-tail keywords and entities for the topic of {X}, categorized by buyer stage (awareness, consideration, decision).
Generate a list of the top {X} most popular and semantically related keywords and entities for the topic of {X} that are relevant to my target audience (e.g., small businesses in the United States).
Generate a list of the top {X} most popular and semantically related keywords and entities for the topic of {X} that are used in high-quality content.
Generate a list of the top {X} most popular and semantically related keywords and entities for the topic of {X} that are relevant to my competitors.
-------------------------------------------------------------------
--- Write seven subheadings for the blog article with the title [title]; the titles should be catchy and 60 characters max.
--- List the top 5 most popular long tail keywords for the topic [YOUR TOPIC]
--- What Are The {X} Most Popular Sub-topics Related To {Topic}?
--- What Are The {X} Most Popular Sub-topics Related To {Sub-topic}?
--- List Without Description The Top {X} Most Popular Keywords For The Topic Of {X}
--- List Without Description The Top {X} Most Popular Long-tail Keywords For The Topic “{X}”
--- List Without Description The Top Semantically Related Keywords And Entities For The Topic {X}
--- Give me five popular keywords that include “SEO” in the word, and the following letter starts with a. Once the answer has been done, move on to giving five more popular keywords that include “SEO” for each letter of the alphabet b to z.
--- For the topic of “{Topic}” list 10 keywords each for the different types of user personas
--- Generate 50 keywords for the topic “[Topic]” that contain “vs”
--- Perform the following steps in a consecutive order Step 1, Step 2, Step 3, Step 4.
Step 1 Generate the 5 most popular keywords related to the topic of “keyword" with their search intent.
Step 2 For each keyword provide 2 long-tail keywords.
Step 3 Generate the 5 most popular questions that include those keywords.
Step 4 Generate 5 blog article titles based on the keywords from Step 1 and Step 2.
--- As a technical writer experienced in SEO, please create a detailed blog post outline that provides a step-by-step guide
for using [X], targeting beginners with a friendly and helpful tone and a desired length of 800-1000 words.

View File

@@ -0,0 +1,33 @@
## Implementation approach
To implement the SEO module, we will use the following open-source tools and frameworks:
1. Natural Language Toolkit (NLTK): NLTK is a popular library for natural language processing in Python. We can leverage NLTK to perform various SEO checks on the given text, such as keyword density, readability analysis, and sentiment analysis.
2. Beautiful Soup: Beautiful Soup is a Python library for web scraping. We can use Beautiful Soup to extract relevant information from the given text, such as meta tags, headings, and image alt attributes.
3. PyEnchant: PyEnchant is a spell checking library for Python. We can utilize PyEnchant to check the spelling and grammar of the given text and provide suggestions for improvement.
4. TextBlob: TextBlob is a library for processing textual data. We can use TextBlob to perform part-of-speech tagging, noun phrase extraction, and other linguistic analyses on the given text.
5. Flask: Use Flask for local testing and development purposes. Flask provides a lightweight web framework that allows us to quickly build and test our SEO module.
Overall, by leveraging these open-source tools and frameworks, we can develop a comprehensive and efficient SEO module that meets the requirements and provides valuable insights and suggestions for improving the SEO of the given text.
## Required Python third-party packages
- nltk==3.6.2
- beautifulsoup4==4.9.3
- pyenchant==3.2.1
- textblob==0.15.3
- flask==1.1.2
## Modules
The 'text_processor.py' file contains the TextProcessor class, which is responsible for extracting meta tags, headings, and image alt attributes from the given text.
The 'spell_checker.py' file contains the SpellChecker class, which is responsible for checking the spelling and grammar of the given text.
The 'seo_checker.py' file contains the SEOChecker class, which is responsible for coordinating the SEO checks by utilizing the TextProcessor and SpellChecker classes.

View File

@@ -0,0 +1,135 @@
###################################################
#
# The script covers many SEO factors, including keyword presence, title length,
# meta description, images, img alt text, headings, internal links, external links,
# spelling errors, grammar errors, and readability.
#
##################################################
import re
from bs4 import BeautifulSoup
from textstat import flesch_reading_ease
import spellchecker
class SEOAnalyzer:
def __init__(self, html_content, target_keywords):
self.html_content = html_content
self.target_keywords = target_keywords
def analyze_html_content(self):
try:
soup = BeautifulSoup(self.html_content, 'html.parser')
# Extract and clean text from HTML
text = ' '.join(soup.stripped_strings)
text = re.sub(r'\s+', ' ', text)
# Calculate keyword density
keyword_density = {}
for keyword in self.target_keywords:
keyword_density[keyword] = (text.lower().count(keyword.lower()) / len(text.split())) * 100
# Check for the presence of keywords in the title
title_tag = soup.find('title')
title_text = title_tag.text.lower() if title_tag else ''
keyword_presence_in_title = {keyword: keyword.lower() in title_text for keyword in self.target_keywords}
# Check for the presence of images and keywords in image alt text
images = soup.find_all('img')
img_alt_text = [img.get('alt', '').lower() for img in images]
keyword_presence_in_img_alt_text = {keyword: any(keyword.lower() in alt_text for alt_text in img_alt_text) for keyword in self.target_keywords}
# Check for the presence of headings
headings = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
headings_text = ' '.join(heading.text.lower() for heading in headings)
# Check for the presence of internal and external links
internal_links = len([link for link in soup.find_all('a') if '#' not in link.get('href', '')])
external_links = len([link for link in soup.find_all('a') if 'http' in link.get('href', '')])
# Calculate readability score
readability_score = flesch_reading_ease(text)
# Check for spelling and grammar errors
spell = spellchecker.SpellChecker()
spelling_errors = len(spell.unknown(text.split()))
grammar_errors = len(spell.check_grammar(text))
# Calculate SEO score
seo_score = 0
# Check for the presence of relevant keywords
for keyword in self.target_keywords:
if keyword in text.lower():
seo_score += 1
# Check for title length
title_length = len(title_text.split()) if title_text else 0
recommended_title_length = (50, 70)
if recommended_title_length[0] <= title_length <= recommended_title_length[1]:
seo_score += 1
# Generate suggestions for improvement
suggestions = []
if seo_score < 5:
suggestions.append("Add more relevant keywords to your HTML content.")
suggestions.append("Make sure your title contains keywords.")
suggestions.append("Add keywords to image alt text.")
suggestions.append("Add headings to your HTML content.")
suggestions.append("Add internal links to your HTML content.")
return {
'Keyword Density': keyword_density,
'Keyword Presence in Title': keyword_presence_in_title,
'Keyword Presence in Image Alt Text': keyword_presence_in_img_alt_text,
'Headings Text': headings_text,
'Internal Links': internal_links,
'External Links': external_links,
'Readability Score': readability_score,
'Spelling Errors': spelling_errors,
'Grammar Errors': grammar_errors,
'SEO Score': seo_score,
'Suggestions': suggestions
}
except Exception as e:
return {'error': str(e)}
# Example usage:
if __name__ == "__main__":
html_content = """
<!DOCTYPE html>
<html>
<head>
<title>SEO Analyzer - Sample Page</title>
<meta name="description" content="This is a sample page for SEO analysis.">
</head>
<body>
<h1>Welcome to the SEO Analyzer</h1>
<p>This is a sample page with some sample content for SEO analysis. It mentions the target keywords SEO, keywords, and content.</p>
<img src="image1.jpg" alt="SEO image">
<img src="image2.jpg" alt="Keywords image">
</body>
</html>
"""
keywords = ['SEO', 'keywords', 'content'] # Replace with your target keywords
seo_analyzer = SEOAnalyzer(html_content, keywords)
results = seo_analyzer.analyze_html_content()
print("SEO Analysis Results:")
print(f"Keyword Density: {results['Keyword Density']}")
print(f"Keyword Presence in Title: {results['Keyword Presence in Title']}")
print(f"Keyword Presence in Image Alt Text: {results['Keyword Presence in Image Alt Text']}")
print(f"Headings Text: {results['Headings Text']}")
print(f"Internal Links: {results['Internal Links']}")
print(f"External Links: {results['External Links']}")
print(f"Readability Score: {results['Readability Score']}")
print(f"Spelling Errors: {results['Spelling Errors']}")
print(f"Grammar Errors: {results['Grammar Errors']}")
print(f"SEO Score: {results['SEO Score']}")
print("Suggestions:")
for suggestion in results['Suggestions']:
print(suggestion)

View File

@@ -0,0 +1,65 @@
##############################################################################################
#
# Checks for:
# Short, fragmented sentences that lack human-like coherence.
# Frequent use of overly complex words or technical jargon.
#
# These checks are based on common observations that AI-generated content may sometimes produce
# text with unusual patterns or characteristics. However, please keep in mind that these
# heuristics are not guaranteed to detect all AI-generated content, and false positives or
# negatives can still occur. More advanced techniques and models would be required for more accurate detection.
#
#############################################################################################
import spacy
# Load the English language model from spaCy
nlp = spacy.load("en_core_web_sm")
def is_ai_generated(text):
# Tokenize the text using spaCy
doc = nlp(text)
# Check for indicators of AI-generated content
ai_indicators = [
"generated by AI",
"auto-generated",
"machine-generated",
"artificial intelligence",
"neural network",
"GPT-3",
"AI model",
]
for indicator in ai_indicators:
if indicator.lower() in text.lower():
return True
# Check for repetitive patterns or lack of human-like variations
for i in range(len(doc) - 2):
if doc[i].text == doc[i + 1].text == doc[i + 2].text:
return True
# Check for short, fragmented sentences that lack human-like coherence
for sentence in doc.sents:
if len(sentence) < 5:
return True
# Check for frequent use of overly complex words or technical jargon
complex_word_count = sum(1 for token in doc if token.is_alpha and len(token.text) > 10)
if complex_word_count > len(doc) // 10: # Adjust the threshold as needed
return True
return False
if __name__ == "__main__":
input_text = """
This is an article generated by a state-of-the-art AI model.
The content is machine-generated and may not represent human writing style.
"""
if is_ai_generated(input_text):
print("The content appears to be AI-generated.")
else:
print("The content appears to be written by a human.")

View File

@@ -0,0 +1,80 @@
import re
#######################################################################
#
# I want you to act as a plagiarism checker. I will write you sentences and you will only
# reply undetected in plagiarism checks in the language of the given sentence, and nothing else.
# Do not write explanations on replies. My first sentence is {}
#
######################################################################
class PlagiarismChecker:
def __init__(self, known_sources):
self.known_sources = known_sources
def check_plagiarism(self, html_content):
try:
# Preprocess the HTML content by removing HTML tags and extra spaces
text = re.sub(r'<[^>]+>', ' ', html_content)
text = re.sub(r'\s+', ' ', text).strip().lower()
# Check for exact matches with known sources
for source in self.known_sources:
source_text = re.sub(r'<[^>]+>', ' ', source)
source_text = re.sub(r'\s+', ' ', source_text).strip().lower()
if text == source_text:
return f"Plagiarism detected: Matches known source - {source}"
# If no exact matches are found
return "No plagiarism detected. Content is original."
except Exception as e:
return str(e)
# Example usage:
if __name__ == "__main__":
# List of known sources
known_sources = [
"""
<html>
<head>
<title>Sample Page 1</title>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is sample content from known source 1.</p>
</body>
</html>
""",
"""
<html>
<head>
<title>Sample Page 2</title>
</head>
<body>
<h1>Welcome to Known Source 2</h1>
<p>This is some content from another known source.</p>
</body>
</html>
"""
]
# HTML content to check for plagiarism
html_content = """
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is sample content.</p>
</body>
</html>
"""
plagiarism_checker = PlagiarismChecker(known_sources)
result = plagiarism_checker.check_plagiarism(html_content)
print(result)

View File

@@ -0,0 +1,3 @@
Act as an SEO specialist, analyze [website URL], and make improvement suggestions regarding technical SEO with the ways to make those improvements listed in a table.

View File

@@ -0,0 +1,115 @@
from typing import List, Dict, Union
from nltk import tokenize, stem, pos_tag
from textblob import TextBlob
import enchant
class TextPreprocessor:
def preprocess_text(self, text: str) -> str:
# Tokenize the text
tokens = tokenize.word_tokenize(text)
# Stem the tokens
stemmer = stem.PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in tokens]
# Join the stemmed tokens back into a string
preprocessed_text = ' '.join(stemmed_tokens)
return preprocessed_text
class SEOAnalyzer:
def calculate_seo_percentage(self, text: str, keywords: List[str]) -> float:
# Calculate the keyword density
keyword_density = self.calculate_keyword_density(text, keywords)
# Calculate the readability score
readability_score = self.calculate_readability_score(text)
# Perform semantic analysis
semantic_score = self.perform_semantic_analysis(text)
# Calculate the SEO percentage based on the metrics
seo_percentage = (keyword_density + readability_score + semantic_score) / 3
return seo_percentage
def calculate_keyword_density(self, text: str, keywords: List[str]) -> float:
# Count the number of occurrences of each keyword in the text
keyword_counts = {keyword: text.lower().count(keyword.lower()) for keyword in keywords}
# Calculate the total number of words in the text
word_count = len(tokenize.word_tokenize(text))
# Calculate the keyword density
keyword_density = sum(keyword_counts.values()) / word_count
return keyword_density
def calculate_readability_score(self, text: str) -> float:
# Calculate the average number of words per sentence
sentences = tokenize.sent_tokenize(text)
word_count = sum(len(tokenize.word_tokenize(sentence)) for sentence in sentences)
sentence_count = len(sentences)
average_words_per_sentence = word_count / sentence_count
# Calculate the readability score
readability_score = 1 / average_words_per_sentence
return readability_score
def perform_semantic_analysis(self, text: str) -> float:
# Perform part-of-speech tagging on the text
tagged_text = pos_tag(tokenize.word_tokenize(text))
# Calculate the semantic score based on the number of nouns and verbs
noun_count = sum(1 for word, pos in tagged_text if pos.startswith('N'))
verb_count = sum(1 for word, pos in tagged_text if pos.startswith('V'))
semantic_score = (noun_count + verb_count) / len(tagged_text)
return semantic_score
class SpellChecker:
def check_spelling(self, text: str) -> List[str]:
# Create a spellchecker object
spellchecker = enchant.Dict("en_US")
# Tokenize the text
tokens = tokenize.word_tokenize(text)
# Check the spelling of each token
misspelled_words = [token for token in tokens if not spellchecker.check(token)]
return misspelled_words
class SEOAnalysisModule:
def __init__(self):
self.text_preprocessor = TextPreprocessor()
self.seo_analyzer = SEOAnalyzer()
self.spell_checker = SpellChecker()
def analyze_text(self, text: str, keywords: List[str]) -> Dict[str, Union[float, List[str]]]:
# Preprocess the text
preprocessed_text = self.text_preprocessor.preprocess_text(text)
# Calculate the SEO percentage
seo_percentage = self.seo_analyzer.calculate_seo_percentage(preprocessed_text, keywords)
# Calculate the keyword density
keyword_density = self.seo_analyzer.calculate_keyword_density(preprocessed_text, keywords)
# Calculate the readability score
readability_score = self.seo_analyzer.calculate_readability_score(preprocessed_text)
# Perform semantic analysis
semantic_score = self.seo_analyzer.perform_semantic_analysis(preprocessed_text)
# Check the spelling
spelling_errors = self.spell_checker.check_spelling(preprocessed_text)
return {
'seo_percentage': seo_percentage,
'keyword_density': keyword_density,
'readability_score': readability_score,
'semantic_score': semantic_score,
'spelling_errors': spelling_errors
}

View File

@@ -0,0 +1,57 @@
"""
At the command line, only need to run once to install the package via pip:
$ pip install google-generativeai
"""
import os
import sys
import google.generativeai as genai
def research_yt(keywords):
""" Research top youtube videos for given keywords """
try:
genai.configure(api_key=os.getenv('GEMINI_API_KEY'))
except Exception as err:
print("Google Gemini Error: {err}")
# Set up the model
generation_config = {
"temperature": 0.9,
"top_p": 1,
"top_k": 1,
"max_output_tokens": 2048,
}
safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
]
model = genai.GenerativeModel(model_name="gemini-pro",
generation_config=generation_config,
safety_settings=safety_settings)
prompt_parts = [f"Research 5 latest youtube urls on {keywords}, released this week. Check the number of views and also get the references from youtube video description. REMEMBER to make sure, your response urls are available and valid. For each result, visit their webpages to write detailed quickstart code samples, preferably in python. Your response urls should consist of trending topics on latest {keywords}. Your response should be in json format, so that i can easily parse all the fields. For consistency, always use json key names as Title, URL, Views, References and Quickstart_Code."]
try:
response = model.generate_content(prompt_parts)
except Exception as err:
print(f"Failed to get response from Gemini Pro.{response}")
sys.exit(1)
return response.text