diff --git a/README.md b/README.md
index 69f39a16..6d5f323a 100644
--- a/README.md
+++ b/README.md
@@ -5,28 +5,15 @@ This toolkit automates and enhances the process of blog creation, optimization,
## Features
-### Blog Generation and Optimization
-- **YouTube to Blog Conversion**: Converts YouTube videos into detailed blog posts by extracting and transcribing audio, then generating text-based content. TBD: Audio to blog.
-
- **Online Research Integration**: Enhances blog content by integrating insights and information gathered from online research, ensuring the content is informative and up-to-date. This gives context for generating content. Tavily AI, Google search, serp and Vision AI is used to scrape web data for context augumentation. TBD: Include CrewAI for web research agents.
- **Image Generation and Processing**: Utilizes AI models like DALL-E 3, stable difffusion to create relevant images based on blog content. Offers features to process and optimize images for web usage. FIXME: Need more work with stable diffusion.
-- **Write Scholarly Article**: Does search for given keywords, arxiv IDs and write review or blog on research papers. Basically, PDF to Blog.
-
-- **Write blogs from PDFs**: TBD . The code is there, need to abstract/extract it. There is RAG with llamaindex for 'n' pdfs.
-- **
- **SEO Optimization**: Employs AI to generate SEO-friendly blog titles, meta descriptions, tags, and categories. Ensures content is optimized for search engines.
-- **Blog Output formats**: For easy upload to website, blogs output format can be in plaintext, HTML, Mardown/MLA format.
-
-- **Wordpress Integration**: Implemented generating and uploading blog content, media to wordpress via its REST APIs. Most of the static website which can work with markdown style should work with little testing.
+- **Wordpress, Jekyll Integration**: Implemented generating and uploading blog content, media to wordpress via its REST APIs. Most of the static website which can work with markdown style should work with little testing.
-### Speech-to-Text Conversion
-- **Audio Transcription**: Converts speech from video content into text, facilitating the creation of blogs and articles from video sources.
-- **AI models used**: OpenAI whisper model, (TBD) AssemblyAI
-
### AI-Driven Content Creation
- **Text Generation**: Leverages OpenAI's ChatGPT, Google Gemini Pro for generating text for blogs.
- **Customizable AI Parameters**: (FIXME) Offers flexibility in adjusting AI parameters like model selection, temperature, and token limits to suit different content needs.
@@ -35,64 +22,62 @@ This toolkit automates and enhances the process of blog creation, optimization,
- **Analyzing and Extracting Image Details**: Uses OpenAI's Vision API, Google Gemini vision to analyze images and extract details such as alt text, descriptions, titles, and captions, enhancing the SEO of image content.
---
-
-## Installation and Configuration
-1. **Clone the Repository**: Clone the toolkit from the provided repository link.
-2. **Install Dependencies**: Install necessary Python packages and libraries.
-
-
-## Installation
----
-
**Note**: This toolkit is designed for automated blog management and requires appropriate API keys and access credentials for full functionality.
-
-### 1). Prerequisites: pip install requirements.txt
-```
-pip install -r requirements.txt
-```
---
-### 2). OpenAI, Gemini API keys
-Create a file .env in the present directory and include OpenAI keys.
-FIXME: The code is little messed up here.
+### Web Research
+- **Keyword Research**: Conduct in-depth keyword research by specifying search queries and time ranges.
+- **Domain-Specific Searches**: Include specific URLs to confine searches to certain domains, such as Wikipedia or competitor websites.
+- **Semantic Analysis**: Explore similar topics and technologies by providing a reference URL for semantic analysis.
----
+### Competitor Analysis
+- **Similar Company Discovery**: Analyze competitor websites to discover similar companies, startups, and technologies.
+- **Industry Insights**: Gain insights into industry trends, market competitors, and emerging technologies.
-This is in active development and needs ironing out. The main concern is make it general purpose, for all.
-Usuability and extendibility are major concerns. This section will be updated soon.
+### Blog Writing
+- **Keyword-Based Blogs**: Generate blog content based on specified keywords, leveraging AI to produce engaging and informative articles.
+- **Audio Blog Generation**: Convert audio from YouTube videos into blog posts, facilitating content creation from multimedia sources.
+- **GitHub Repository Blogs**: Transform GitHub repositories or topics into blog posts, showcasing code examples and project insights.
+- **Scholarly Research Blogs**: Generate blog content based on research papers, summarizing key findings and insights.
-usage: pseo_main.py [-h] [--csv CSV] [--keywords KEYWORDS] [--youtube_urls YOUTUBE_URLS] [--scholar SCHOLAR] [--niche] [--wordpress]
- [--output_format {plaintext,markdown,html}]
+### Blogging Tools
+- **Title and Meta Description Generation**: Generate catchy titles and meta descriptions for blog posts to improve SEO and user engagement.
+- **Blog Outline Creation**: Generate outlines for blog posts, aiding in structuring content and organizing ideas.
+- **FAQ Generation**: Automatically generate FAQs (Frequently Asked Questions) based on blog content, enhancing user engagement and SEO.
+- **HTML and Markdown Conversion**: Convert blog posts between HTML and Markdown formats for easy integration with various platforms.
+- **Blog Proofreading**: Proofread blog content for grammar, spelling, and readability, ensuring high-quality output.
+- **Tag and Category Suggestions**: Generate tags and categories for blog posts based on content analysis, improving organization and discoverability.
-options:
- -h, --help show this help message and exit
- --csv CSV Provide path csv file. Check the template csv for example.
- --keywords KEYWORDS Keywords for blog generation.
- --youtube_urls YOUTUBE_URLS
- Comma-separated YouTube URLs for blog generation.
- --scholar SCHOLAR Write blog from latest research papers on given keywords. Use 'arxiv_papers_url' to provide a file arxiv url
- list.
- --niche Flag to generate niche blogs (default: False).
- --wordpress Flag to upload blogs to WordPress (default: False).
- --output_format {plaintext,markdown,html}
- Output format of the blogs (default: plaintext).
+### Interactive Mode
+- **User-Friendly Interface**: Navigate tasks and options easily through an interactive command-line interface.
+- **Menu-Driven Interaction**: Choose between various options, tasks, and tools using intuitive menus and prompts.
+- **Task Guidance**: Receive guidance and instructions for each task, facilitating user interaction and decision-making.
----
+## Packages, Tools, and APIs Used
-**Example Usage:**
-- **Keyword usage**:
-```
-python pseo_main.py --keywords "Writesonic AI SEO-optimized blog writing,PepperType AI virtual content assistant,Copysmith AI enterprise eCommerce content,Copy AI artificial intelligence content generator,Jasper AI creative content platform,Contents generative AI content strategy"
-```
-**YouTube usage**:
-```
-python pseo_main.py --youtube https://www.youtube.com/watch?v=yu27PWzJI_Y,https://www.youtube.com/watch?v=WGzoBD-xthI,https://www.youtube.com/watch?v=zizonToFXDs
-```
-**Scholar usage**:
-```
-python pseo_main.py --scholar "GPT-4 Technical Report"
-```
+- **Libraries**:
+ - PyInquirer: For creating interactive command-line interfaces.
+ - Typer: For building CLI applications with ease.
+ - Tabulate: For formatting data in tabular form.
+ - Requests: For making HTTP requests to web APIs.
+ - python-dotenv: For loading environment variables from a .env file.
+- **APIs**:
+ - Metaphor API: Provides semantic search capabilities for finding similar topics and technologies.
+ - Tavily API: Offers AI-powered web search functionality for conducting in-depth keyword research.
+ - SerperDev API: Enables access to search engine results and competitor analysis data.
+ - OpenAI API: Powers the Large Language Models (LLMs) for generating blog content and conducting research.
+ - Gemini API: Another LLM provider for natural language processing tasks.
+ - Ollama API (Work In Progress): An upcoming LLM provider for additional research and content generation capabilities.
+
+## Getting Started
+
+To use this tool, follow these steps:
+
+1. Clone this repository to your local machine.
+2. Install the required dependencies using `pip install -r requirements.txt`.
+3. Run the script by executing `python blogen.py`.
+4. Set up the necessary API keys by following the instructions provided in the script and adding them to the `.env` file.
---
Notes:
diff --git a/blogen.py b/blogen.py
index 418ee606..ba98fb6c 100644
--- a/blogen.py
+++ b/blogen.py
@@ -13,7 +13,8 @@ load_dotenv(Path('.env'))
app = typer.Typer()
from lib.ai_web_researcher.gpt_online_researcher import gpt_web_researcher
-
+from lib.ai_web_researcher.metaphor_basic_neural_web_search import metaphor_find_similar
+from lib.ai_writers.keywords_to_blog import write_blog_from_keywords
def prompt_for_time_range():
@@ -36,7 +37,8 @@ def write_blog_options():
'type': 'list',
'name': 'blog_type',
'message': 'š Choose a blog type:',
- 'choices': ['Keywords', 'Audio YouTube', 'GitHub', 'Scholar', 'Quit'],
+ 'choices': ['Keywords', 'Audio YouTube', 'Programming',
+ 'Scholar', 'News/TBD','Finance/TBD', 'Quit'],
}
]
answers = prompt(questions)
@@ -55,6 +57,7 @@ def start_interactive_mode():
text.append("\nā ļø Alert! š„āš„\n", style="bold red")
text.append("If you know what to write, choose 'Write Blog'\n", style="bold blue")
text.append("If unsure, lets 'do web research' to write on\n", style="bold red")
+ text.append("If Testing-it-out/getting-started, choose 'Blog Tools\n", style="bold green")
text.append("_______________________________________________________________________\n")
print(text)
@@ -64,28 +67,29 @@ def start_interactive_mode():
'type': 'list',
'name': 'mode',
'message': 'Choose an option:',
- 'choices': ['Write Blog', 'Do Web Research', 'Competitor Analysis', 'FAQ Generator', 'Quit'],
+ 'choices': ['Write Blog', 'Do keyword Research', 'Create Blog Images',
+ 'Competitor Analysis', 'Blog Tools', 'Quit'],
}
]
answers = prompt(questions)
mode = answers['mode']
if mode == 'Write Blog':
write_blog()
- elif mode == 'Do Web Research':
+ elif mode == 'Do keyword Research':
do_web_research()
- elif mode == 'FAQ Generator':
+ elif mode == 'Create Blog Images':
faq_generator()
elif mode == 'Competitor Analysis':
- # https://github.com/com-puter-tips/SEO-Analysis
- # https://github.com/sundios/SEO-Lighthouse-Multiple-URLs
- # https://github.com/Gingerbreadfork/Cutlery
# Metaphor similar search
competitor_analysis()
- elif mode == 'News Analysis':
+ elif mode == 'Recent News Summarizer':
print("""1. Get tavily News.
2. Get metaphor news.
3. Get from NewsApi
4. Get YOU.com News.""")
+ recent_news_summarizer()
+ elif mode == 'Blog Tools':
+ blog_tools()
elif mode == 'Quit':
typer.echo("Exiting, F*** Off!")
raise typer.Exit()
@@ -130,7 +134,7 @@ def check_environment_variables():
if missing_keys:
print("\nMost are Free APIs and really worth your while signing up for them.")
- print(":pile_of_poo::pile_of_poo::pile_of_poo: GO GET THEM, on above urls. [bold red]")
+ print(":pile_of_poo: :pile_of_poo: GO GET THEM, on above urls. [bold red]")
print("Note: They offer free/limited api calls, so we use most of them to have a lot of free api calls.")
print("\n[bold red]TBD: Provide option to use user defined search engines.\n")
for key, description in missing_keys:
@@ -138,11 +142,84 @@ def check_environment_variables():
else:
return True
+
+def check_llm_environs():
+ """ Function to check which LLM api is given. """
+ gpt_provider = os.getenv("GPT_PROVIDER")
+
+ if gpt_provider == "google":
+ api_key_var = "GEMINI_API_KEY"
+ missing_api_msg = f"To get your {api_key_var}, please visit: https://aistudio.google.com/app/apikey"
+ elif gpt_provider == "openai":
+ api_key_var = "OPENAI_API_KEY"
+ missing_api_msg = "To get your OpenAI API key, please visit: https://openai.com/blog/openai-api"
+ else:
+ typer.echo("Unsupported GPT provider specified in GPT_PROVIDER environment variable.")
+ return
+
+ if os.getenv(api_key_var) is None:
+ typer.echo(f"The {api_key_var} environment variable is missing.")
+ typer.echo(missing_api_msg)
+ api_key = typer.prompt(f"Please enter your {api_key_var} API Key:")
+ # Update .env file
+ with open(".env", "a") as env_file:
+ env_file.write(f"{api_key_var}={api_key}\n")
+ typer.echo(f"{api_key_var} API Key added to .env file.")
+ return
+
+ if gpt_provider == "openai" and os.getenv("OPENAI_API_KEY") is None:
+ typer.echo("To get your OpenAI API key, please visit: https://openai.com/blog/openai-api")
+
+
def faq_generator():
return
+def blog_tools():
+ """ Blogging Aid Tools """
+ os.system("clear" if os.name == "posix" else "cls")
+ text = Text()
+ text.append("_______________________________________________________________________")
+ text.append("\nā ļø Alert! š„āš„\n", style="bold red")
+ text.append("Collection of Helpful Blogging Tools, powered by LLMs.\n", style="bold green")
+ text.append("_______________________________________________________________________\n")
+
+ print(text)
+
+ # https://developers.google.com/speed/docs/insights/v5/get-started
+ questions = [
+ {
+ 'type': 'list',
+ 'name': 'mode',
+ 'message': 'Choose a Blogging Tool:',
+ 'choices': ['Write Blog Title', 'Write Blog Meta Description', 'Write Blog Introduction',
+ 'Write Blog conclusion', 'Write Blog Outline', 'Generate Blog FAQs', 'Research blog referances',
+ 'Convert Blog To HTML', 'Convert Blog To Markdown', 'Blog Proof Reader',
+ 'Get Blog Tags', 'Get blog categories', 'Get Blog Code Examples', 'Quit',
+ 'Check WebPage Performance',],
+ }
+ ]
+ answers = prompt(questions)
+ mode = answers['mode']
+ if mode == 'Write Blog Title':
+ return
+
+
def competitor_analysis():
+ """ Do metaphor similar search """
+ text = Text()
+ text.append("_______________________________________________________________________")
+ text.append("\nā ļø Alert! š„āš„\n", style="bold red")
+ text.append("Provide competitor's URL, get details of similar/alternative companies.\n", style="bold red")
+ text.append("Usecases: Know similar companies and alternatives, to given URL\n", style="bold blue")
+ text.append("_______________________________________________________________________\n")
+ print(text)
+ similar_url = typer.prompt(f"Enter Valid URL to get web analysis")
+
+ try:
+ metaphor_find_similar(similar_url)
+ except Exception as err:
+ print(f"[bold red]ā š« Failed to do similar search.\nError:{err}[/bold red]")
return
@@ -153,8 +230,7 @@ def write_blog():
blog_type = write_blog_options()
if blog_type == 'Keywords':
- keywords = typer.prompt("Enter keywords for blog generation:")
- print(f"Write blog based on keywords: {keywords}")
+ blog_from_keyword()
elif blog_type == 'Audio YouTube':
audio_youtube = typer.prompt("Enter YouTube URL for audio blog generation:")
print(f"Write audio blog based on YouTube URL: {audio_youtube}")
@@ -165,10 +241,18 @@ def write_blog():
scholar = typer.prompt("Enter research papers keywords:")
print(f"Write blog based on scholar: {scholar}")
elif blog_type == 'Quit':
- typer.echo("Exiting, Fuck Off!")
+ typer.echo("Exiting, F*** Off!")
raise typer.Exit()
+def blog_from_keyword():
+ """ Write blog from given keyword. """
+ print("Write blog based on keywords.")
+ check_llm_environs()
+ keywords = typer.prompt("Enter 'keywords/Blog Title' for blog generation:")
+ final_blog = write_blog_from_keywords(keywords)
+
+
def do_web_research():
"""
Do Web Research option with time_range, search_keywords, and include_urls sub-options.
diff --git a/lib/ai_web_researcher/ai_news_researcher.py b/lib/ai_web_researcher/ai_news_researcher.py
new file mode 100644
index 00000000..f1dedc33
--- /dev/null
+++ b/lib/ai_web_researcher/ai_news_researcher.py
@@ -0,0 +1,172 @@
+################################################################
+#
+#
+#
+##############################################################
+
+import os
+import json
+from pathlib import Path
+import sys
+from typing import List, NamedTuple
+from loguru import logger
+from datetime import datetime
+
+from ..gpt_providers.gemini_pro_text import gemini_text_response
+from .tavily_ai_search import get_tavilyai_results
+from .metaphor_basic_neural_web_search import metaphor_news_summarizer
+from .google_serp_search import google_news
+from .google_trends_researcher import do_google_trends_analysis
+from .gpt_blog_sections import get_blog_sections_from_websearch
+from .web_research_report import write_web_research_report
+
+
+# Configure logger
+logger.remove()
+logger.add(sys.stdout,
+ colorize=True,
+ format="{level}|{file}:{line}:{function}| {message}"
+ )
+
+
+def web_news_researcher(search_keywords, time_range=None, include_domains=list(), similar_url=None):
+ """ """
+ print(f"Web Research:Time Range - {time_range},Search Keywords - {search_keywords},Include URLs - {include_domains}")
+ if not include_domains:
+ include_domains = list()
+ # TBD: Keeping the results directory as fixed, for now.
+ os.environ["SEARCH_SAVE_FILE"] = os.path.join(os.getcwd(), "workspace", "web_research_reports",
+ search_keywords.replace(" ", "_") + "_" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
+
+ # Collect all blog titles featuring in search results. This *may help in generating blog titles
+ # closest to competing ones. All search blog titles, given keyword and keywords from analysis, give
+ # llm a good context for the task of generating blog titles.
+ blog_titles = []
+ # Get a list of FAQs from search results.
+ blog_faqs = None
+ google_result = None
+ tavily_result = None
+ report = None
+ try:
+ logger.info(f"Doing Google search for: {search_keywords}\n")
+ google_result = google_search(search_keywords)
+ blog_titles.append(extract_info(google_result, "titles"))
+ except Exception as err:
+ logger.error(f"Failed to do Google Serpapi research: {err}")
+ # Not failing, as tavily would do same and then GPT-V to search.
+
+ try:
+ # FIXME: Include the follow-up questions as blog FAQs.
+ logger.info(f"Doing Tavily AI search for: {search_keywords}")
+ tavily_result = get_tavilyai_results(search_keywords, include_domains)
+ blog_titles.append(tavily_extract_information(tavily_result, "titles"))
+ except Exception as err:
+ logger.error(f"Failed to do Tavily AI Search: {err}")
+
+ try:
+ logger.info(f"Start Semantic/Neural web search with Metahpor: {search_keywords}")
+ response_articles = metaphor_search_articles(
+ search_keywords,
+ include_domains=include_domains,
+ time_range=time_range,
+ similar_url=similar_url)
+ blog_titles.append(metaphor_extract_titles_or_text(response_articles, return_titles=True))
+ except Exception as err:
+ logger.error(f"Failed to do Metaphor search: {err}")
+ print(blog_titles)
+
+ try:
+ logger.info(f"Do Google Trends analysis for given keywords: {search_keywords}")
+ important_keywords = do_google_trends_analysis(search_keywords)
+ except Exception as err:
+ logger.error(f"Failed to do google trends analysis: {err}")
+ print(important_keywords)
+ # Now that we have search results from given keywords. Generate blog title and subtopics suggestions.
+ # 1. Return a list of related keywords along with search volumes.
+ # 2. New blog titles to write on(niche, top) and blog sections.
+ # 3. Competitors list, similar urls if given.
+ print(f"\n\nReview the analysis in this file at: {os.environ.get('SEARCH_SAVE_FILE')}\n")
+
+
+def metaphor_extract_titles_or_text(json_data, return_titles=True):
+ """
+ Extract either titles or text from the given JSON structure.
+
+ Args:
+ json_data (list): List of Result objects in JSON format.
+ return_titles (bool): If True, return titles. If False, return text.
+
+ Returns:
+ list: List of titles or text.
+ """
+ if return_titles:
+ return [(result.title) for result in json_data]
+ else:
+ return [result.text for result in json_data]
+
+
+def extract_info(json_data, info_type):
+ """
+ Extract information (titles, peopleAlsoAsk, or relatedSearches) from the given JSON.
+
+ Args:
+ json_data (dict): The JSON data.
+ info_type (str): The type of information to extract (titles, peopleAlsoAsk, relatedSearches).
+
+ Returns:
+ list or None: A list containing the requested information, or None if the type is invalid.
+ """
+ if info_type == "titles":
+ return [result.get("title") for result in json_data.get("organic", [])]
+ elif info_type == "peopleAlsoAsk":
+ return [item.get("question") for item in json_data.get("peopleAlsoAsk", [])]
+ elif info_type == "relatedSearches":
+ return [item.get("query") for item in json_data.get("relatedSearches", [])]
+ else:
+ print("Invalid info_type. Please use 'titles', 'peopleAlsoAsk', or 'relatedSearches'.")
+ return None
+
+
+def tavily_extract_information(json_data, keyword):
+ """
+ Extract information from the given JSON based on the specified keyword.
+
+ Args:
+ json_data (dict): The JSON data.
+ keyword (str): The keyword (title, content, answer, follow-query).
+
+ Returns:
+ list or str: The extracted information based on the keyword.
+ """
+ if keyword == 'title':
+ return [result['title'] for result in json_data['results']]
+ elif keyword == 'content':
+ return [result['content'] for result in json_data['results']]
+ elif keyword == 'answer':
+ return json_data['answer']
+ elif keyword == 'follow-query':
+ return json_data['follow_up_questions']
+ else:
+ return f"Invalid keyword: {keyword}"
+
+
+def compete_organic_results(query, report, organic_results):
+ """ Given a blog content and google search organinc results, create a new blog to compete against them."""
+ prompt = f""" As an SEO expert and copywriter, I will provide you with my blog content on topic '{query}', and
+ Top google search results.
+ Your task is to rewrite the given blog to make it compete against top position results.
+ Make sure, the new blog has high probability of ranking highest against given organic search result competitors.
+ Modify the given blog content following best SEO practises.
+ Make sure the blog is original, unique and highly readable.
+ Remember, Maintain and adopt the formatting, structure, style and tone of the provided blog content.
+ Include relevant emojis in your final blog for visual appeal. Use it sparingly.
+ Your response should be well-structured, objective, and critically acclaimed blog article based on provided texts.
+
+ Remember, your goal is to create a detailed blog article that will compete against given organic result competitors.
+ Do not provide explanations, suggestions for your response, reply only with your final response.
+ Take your time in crafting your content, do not rush to give the response.
+ Blog Content: '{report}'\n
+ Organic Search result: '{organic_results}'
+ """
+ report = gemini_text_response(prompt)
+ return report
diff --git a/lib/ai_web_researcher/google_serp_search.py b/lib/ai_web_researcher/google_serp_search.py
index eb214e2b..bcd6228a 100644
--- a/lib/ai_web_researcher/google_serp_search.py
+++ b/lib/ai_web_researcher/google_serp_search.py
@@ -37,7 +37,7 @@ from clint.textui import progress
#from serpapi import GoogleSearch
from loguru import logger
from tabulate import tabulate
-
+from GoogleNews import GoogleNews
# Configure logger
logger.remove()
from dotenv import load_dotenv
@@ -49,7 +49,6 @@ logger.add(
format="{level}|{file}:{line}:{function}| {message}"
)
-from .gpt_titles_faq import gpt_titles_faqs_google_search
#from tenacity import retry, stop_after_attempt, wait_random_exponential
#@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
@@ -199,6 +198,15 @@ def perform_dataforseo_google_search():
return
+def google_news(search_keywords, news_period="7d", region="IN"):
+ """ Get news articles from google_news"""
+ googlenews = GoogleNews()
+ googlenews.enableException(True)
+ googlenews = GoogleNews(lang='en', region=region)
+ googlenews = GoogleNews(period=news_period)
+ print(googlenews.get_news('APPLE'))
+ print(googlenews.search('APPLE'))
+
def process_search_results(search_results):
"""
diff --git a/lib/ai_web_researcher/gpt_online_researcher.py b/lib/ai_web_researcher/gpt_online_researcher.py
index c13e0c91..2a015a3e 100644
--- a/lib/ai_web_researcher/gpt_online_researcher.py
+++ b/lib/ai_web_researcher/gpt_online_researcher.py
@@ -17,10 +17,8 @@ from .tavily_ai_search import get_tavilyai_results
from .metaphor_basic_neural_web_search import metaphor_find_similar, metaphor_search_articles
from .google_serp_search import google_search
from .google_trends_researcher import do_google_trends_analysis
-from .gpt_blog_sections import get_blog_sections_from_websearch
from .web_research_report import write_web_research_report
-
# Configure logger
logger.remove()
logger.add(sys.stdout,
@@ -32,60 +30,63 @@ logger.add(sys.stdout,
def gpt_web_researcher(search_keywords, time_range=None, include_domains=list(), similar_url=None):
""" """
print(f"Web Research:Time Range - {time_range},Search Keywords - {search_keywords},Include URLs - {include_domains}")
+ # TBD: Keeping the results directory as fixed, for now.
+ os.environ["SEARCH_SAVE_FILE"] = os.path.join(os.getcwd(), "workspace", "web_research_reports", search_keywords.replace(" ", "_") + "_" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
if not include_domains:
include_domains = list()
- # TBD: Keeping the results directory as fixed, for now.
- os.environ["SEARCH_SAVE_FILE"] = os.path.join(os.getcwd(), "workspace", "web_research_reports",
- search_keywords.replace(" ", "_") + "_" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
- # Collect all blog titles featuring in search results. This *may help in generating blog titles
- # closest to competing ones. All search blog titles, given keyword and keywords from analysis, give
- # llm a good context for the task of generating blog titles.
- blog_titles = []
- # Get a list of FAQs from search results.
- blog_faqs = None
- google_result = None
- tavily_result = None
- report = None
+ google_search_result = do_google_serp_search(search_keywords)
+ tavily_search_result = do_tavily_ai_search(search_keywords, include_domains)
+ metaphor_search_result = do_metaphor_ai_research(search_keywords, include_domains, time_range, similar_url)
+ gtrends_search_result = do_google_pytrends_analysis(search_keywords)
+ # get_rag_results(search_query)
+ print(f"\n\nReview the analysis in this file at: {os.environ.get('SEARCH_SAVE_FILE')}\n")
+
+
+def do_google_serp_search(search_keywords):
+ """ """
try:
logger.info(f"Doing Google search for: {search_keywords}\n")
- google_result = google_search(search_keywords)
- blog_titles.append(extract_info(google_result, "titles"))
+ return(google_search(search_keywords))
except Exception as err:
logger.error(f"Failed to do Google Serpapi research: {err}")
# Not failing, as tavily would do same and then GPT-V to search.
+
+def do_tavily_ai_search(search_keywords, include_domains=None):
+ """ """
try:
# FIXME: Include the follow-up questions as blog FAQs.
logger.info(f"Doing Tavily AI search for: {search_keywords}")
- tavily_result = get_tavilyai_results(search_keywords, include_domains)
- blog_titles.append(tavily_extract_information(tavily_result, "titles"))
+ return(get_tavilyai_results(search_keywords, include_domains))
except Exception as err:
logger.error(f"Failed to do Tavily AI Search: {err}")
+
+def do_metaphor_ai_research(search_keywords,
+ include_domains=None,
+ time_range=None,
+ similar_url=None):
+ """ """
try:
logger.info(f"Start Semantic/Neural web search with Metahpor: {search_keywords}")
response_articles = metaphor_search_articles(
- search_keywords,
- include_domains=include_domains,
+ search_keywords,
+ include_domains=include_domains,
time_range=time_range,
similar_url=similar_url)
- blog_titles.append(metaphor_extract_titles_or_text(response_articles, return_titles=True))
+ return response_articles
except Exception as err:
logger.error(f"Failed to do Metaphor search: {err}")
- print(blog_titles)
+
+def do_google_pytrends_analysis(search_keywords):
+ """ """
try:
logger.info(f"Do Google Trends analysis for given keywords: {search_keywords}")
- important_keywords = do_google_trends_analysis(search_keywords)
+ return(do_google_trends_analysis(search_keywords))
except Exception as err:
logger.error(f"Failed to do google trends analysis: {err}")
- print(important_keywords)
- # Now that we have search results from given keywords. Generate blog title and subtopics suggestions.
- # 1. Return a list of related keywords along with search volumes.
- # 2. New blog titles to write on(niche, top) and blog sections.
- # 3. Competitors list, similar urls if given.
- print(f"\n\nReview the analysis in this file at: {os.environ.get('SEARCH_SAVE_FILE')}\n")
def metaphor_extract_titles_or_text(json_data, return_titles=True):
diff --git a/lib/ai_web_researcher/metaphor_basic_neural_web_search.py b/lib/ai_web_researcher/metaphor_basic_neural_web_search.py
index d8cfa772..38ad6003 100644
--- a/lib/ai_web_researcher/metaphor_basic_neural_web_search.py
+++ b/lib/ai_web_researcher/metaphor_basic_neural_web_search.py
@@ -70,7 +70,10 @@ def metaphor_find_similar(similar_url):
raise
competitors = search_response.results
- for acompetitor in tqdm(competitors, desc="Processing Competitors", unit="competitor"):
+ urls = {}
+ for c in competitors:
+ print(c.title + ':' + c.url)
+ for acompetitor in tqdm(competitors, desc="Processing URL content", unit="competitor"):
all_contents = ""
try:
search_response = metaphor.search_and_contents(
@@ -82,16 +85,15 @@ def metaphor_find_similar(similar_url):
logger.error(f"Failed to do metaphor keyword/url research: {err}")
research_response = search_response.results
-
# Add a progress bar for the inner loop
for r in tqdm(research_response, desc=f"{acompetitor.url}", unit="research"):
all_contents += r.text
- try:
- acompetitor.text = summarize_competitor_content(all_contents, "gemini")
- except Exception as err:
- logger.error(f"Failed to summarize_web_content: {err}")
+ try:
+ acompetitor.text = summarize_competitor_content(all_contents, "gemini")
+ except Exception as err:
+ logger.error(f"Failed to summarize_web_content: {err}")
- # Convert the data into a list of lists
+ print(competitors)
print_search_result(competitors)
return search_response
@@ -142,7 +144,6 @@ def metaphor_search_articles(query,
logger.error(f"Failed in metaphor.search_and_contents: {err}")
# From each webpage, get a summary of the web page.
- print(search_response)
contents_response = search_response.results
# for content in tqdm(contents_response, desc="Reading Web URL content:", unit="content"):
# summarized_content = summarize_web_content(content.text, "gemini")
@@ -160,18 +161,37 @@ def metaphor_search_articles(query,
raise
+
+def metaphor_news_summarizer(news_keywords):
+ """ build a LLM-based news summarizer app with the Exa API to keep us up-to-date
+ with the latest news on a given topic.
+ """
+ # FIXME: Needs to be user defined.
+ one_week_ago = (datetime.now() - timedelta(days=7))
+ date_cutoff = one_week_ago.strftime("%Y-%m-%d")
+
+ search_response = exa.search_and_contents(
+ news_keywords, use_autoprompt=True, start_published_date=date_cutoff
+ )
+
+ urls = [result.url for result in search_response.results]
+ print("URLs:")
+ for url in urls:
+ print(url)
+
+
def print_search_result(contents_response):
# Define the Result namedtuple
- Result = namedtuple("Result", ["url", "title", "published_date", "text"])
+ Result = namedtuple("Result", ["url", "title", "text"])
# Tabulate the data
- table_headers = ["URL", "Title", "Published Date", "Summary"]
- table_data = [(result.url, result.title, result.published_date, result.text) for result in contents_response]
+ table_headers = ["URL", "Title", "Summary"]
+ table_data = [(result.url, result.title, result.text) for result in contents_response]
table = tabulate(table_data,
headers=table_headers,
tablefmt="fancy_grid",
- colalign=["left", "left", "left", "left"],
- maxcolwidths=[20, 20, 10, 60])
+ colalign=["left", "left", "left"],
+ maxcolwidths=[20, 20, 70])
print(table)
# Save the combined table to a file
try:
diff --git a/lib/ai_web_researcher/tavily_ai_search.py b/lib/ai_web_researcher/tavily_ai_search.py
index 42a8155c..401b6365 100644
--- a/lib/ai_web_researcher/tavily_ai_search.py
+++ b/lib/ai_web_researcher/tavily_ai_search.py
@@ -46,7 +46,6 @@ logger.add(sys.stdout,
)
from tenacity import retry, stop_after_attempt, wait_random_exponential
-from .gpt_titles_faq import gpt_titles_faqs_google_search
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def get_tavilyai_results(keywords, include_urls, search_depth="advanced"):
diff --git a/lib/ai_web_researcher/you_web_reseacher.py b/lib/ai_web_researcher/you_web_reseacher.py
index 207cbd6b..d685f796 100644
--- a/lib/ai_web_researcher/you_web_reseacher.py
+++ b/lib/ai_web_researcher/you_web_reseacher.py
@@ -1,10 +1,14 @@
+import os
+
import requests
from clint.textui import progress
from loguru import logger
+from pathlib import Path
+from dotenv import load_dotenv
+load_dotenv(Path('../../.env'))
-
-def search_ydc_index(search_query, num_web_results=10, country="IN", api_key=""):
+def search_ydc_index(search_query, num_web_results=10, country="IN"):
"""
Search YDC Index API and retrieve results.
@@ -17,24 +21,20 @@ def search_ydc_index(search_query, num_web_results=10, country="IN", api_key="