diff --git a/README.md b/README.md index 69f39a16..6d5f323a 100644 --- a/README.md +++ b/README.md @@ -5,28 +5,15 @@ This toolkit automates and enhances the process of blog creation, optimization, ## Features -### Blog Generation and Optimization -- **YouTube to Blog Conversion**: Converts YouTube videos into detailed blog posts by extracting and transcribing audio, then generating text-based content. TBD: Audio to blog. - - **Online Research Integration**: Enhances blog content by integrating insights and information gathered from online research, ensuring the content is informative and up-to-date. This gives context for generating content. Tavily AI, Google search, serp and Vision AI is used to scrape web data for context augumentation. TBD: Include CrewAI for web research agents. - **Image Generation and Processing**: Utilizes AI models like DALL-E 3, stable difffusion to create relevant images based on blog content. Offers features to process and optimize images for web usage. FIXME: Need more work with stable diffusion. -- **Write Scholarly Article**: Does search for given keywords, arxiv IDs and write review or blog on research papers. Basically, PDF to Blog. - -- **Write blogs from PDFs**: TBD . The code is there, need to abstract/extract it. There is RAG with llamaindex for 'n' pdfs. -- ** - **SEO Optimization**: Employs AI to generate SEO-friendly blog titles, meta descriptions, tags, and categories. Ensures content is optimized for search engines. -- **Blog Output formats**: For easy upload to website, blogs output format can be in plaintext, HTML, Mardown/MLA format. - -- **Wordpress Integration**: Implemented generating and uploading blog content, media to wordpress via its REST APIs. Most of the static website which can work with markdown style should work with little testing. +- **Wordpress, Jekyll Integration**: Implemented generating and uploading blog content, media to wordpress via its REST APIs. Most of the static website which can work with markdown style should work with little testing. -### Speech-to-Text Conversion -- **Audio Transcription**: Converts speech from video content into text, facilitating the creation of blogs and articles from video sources. -- **AI models used**: OpenAI whisper model, (TBD) AssemblyAI - ### AI-Driven Content Creation - **Text Generation**: Leverages OpenAI's ChatGPT, Google Gemini Pro for generating text for blogs. - **Customizable AI Parameters**: (FIXME) Offers flexibility in adjusting AI parameters like model selection, temperature, and token limits to suit different content needs. @@ -35,64 +22,62 @@ This toolkit automates and enhances the process of blog creation, optimization, - **Analyzing and Extracting Image Details**: Uses OpenAI's Vision API, Google Gemini vision to analyze images and extract details such as alt text, descriptions, titles, and captions, enhancing the SEO of image content. --- - -## Installation and Configuration -1. **Clone the Repository**: Clone the toolkit from the provided repository link. -2. **Install Dependencies**: Install necessary Python packages and libraries. - - -## Installation ---- - **Note**: This toolkit is designed for automated blog management and requires appropriate API keys and access credentials for full functionality. - -### 1). Prerequisites: pip install requirements.txt -``` -pip install -r requirements.txt -``` --- -### 2). OpenAI, Gemini API keys -Create a file .env in the present directory and include OpenAI keys. -FIXME: The code is little messed up here. +### Web Research +- **Keyword Research**: Conduct in-depth keyword research by specifying search queries and time ranges. +- **Domain-Specific Searches**: Include specific URLs to confine searches to certain domains, such as Wikipedia or competitor websites. +- **Semantic Analysis**: Explore similar topics and technologies by providing a reference URL for semantic analysis. ---- +### Competitor Analysis +- **Similar Company Discovery**: Analyze competitor websites to discover similar companies, startups, and technologies. +- **Industry Insights**: Gain insights into industry trends, market competitors, and emerging technologies. -This is in active development and needs ironing out. The main concern is make it general purpose, for all. -Usuability and extendibility are major concerns. This section will be updated soon. +### Blog Writing +- **Keyword-Based Blogs**: Generate blog content based on specified keywords, leveraging AI to produce engaging and informative articles. +- **Audio Blog Generation**: Convert audio from YouTube videos into blog posts, facilitating content creation from multimedia sources. +- **GitHub Repository Blogs**: Transform GitHub repositories or topics into blog posts, showcasing code examples and project insights. +- **Scholarly Research Blogs**: Generate blog content based on research papers, summarizing key findings and insights. -usage: pseo_main.py [-h] [--csv CSV] [--keywords KEYWORDS] [--youtube_urls YOUTUBE_URLS] [--scholar SCHOLAR] [--niche] [--wordpress] - [--output_format {plaintext,markdown,html}] +### Blogging Tools +- **Title and Meta Description Generation**: Generate catchy titles and meta descriptions for blog posts to improve SEO and user engagement. +- **Blog Outline Creation**: Generate outlines for blog posts, aiding in structuring content and organizing ideas. +- **FAQ Generation**: Automatically generate FAQs (Frequently Asked Questions) based on blog content, enhancing user engagement and SEO. +- **HTML and Markdown Conversion**: Convert blog posts between HTML and Markdown formats for easy integration with various platforms. +- **Blog Proofreading**: Proofread blog content for grammar, spelling, and readability, ensuring high-quality output. +- **Tag and Category Suggestions**: Generate tags and categories for blog posts based on content analysis, improving organization and discoverability. -options: - -h, --help show this help message and exit - --csv CSV Provide path csv file. Check the template csv for example. - --keywords KEYWORDS Keywords for blog generation. - --youtube_urls YOUTUBE_URLS - Comma-separated YouTube URLs for blog generation. - --scholar SCHOLAR Write blog from latest research papers on given keywords. Use 'arxiv_papers_url' to provide a file arxiv url - list. - --niche Flag to generate niche blogs (default: False). - --wordpress Flag to upload blogs to WordPress (default: False). - --output_format {plaintext,markdown,html} - Output format of the blogs (default: plaintext). +### Interactive Mode +- **User-Friendly Interface**: Navigate tasks and options easily through an interactive command-line interface. +- **Menu-Driven Interaction**: Choose between various options, tasks, and tools using intuitive menus and prompts. +- **Task Guidance**: Receive guidance and instructions for each task, facilitating user interaction and decision-making. ---- +## Packages, Tools, and APIs Used -**Example Usage:** -- **Keyword usage**: -``` -python pseo_main.py --keywords "Writesonic AI SEO-optimized blog writing,PepperType AI virtual content assistant,Copysmith AI enterprise eCommerce content,Copy AI artificial intelligence content generator,Jasper AI creative content platform,Contents generative AI content strategy" -``` -**YouTube usage**: -``` -python pseo_main.py --youtube https://www.youtube.com/watch?v=yu27PWzJI_Y,https://www.youtube.com/watch?v=WGzoBD-xthI,https://www.youtube.com/watch?v=zizonToFXDs -``` -**Scholar usage**: -``` -python pseo_main.py --scholar "GPT-4 Technical Report" -``` +- **Libraries**: + - PyInquirer: For creating interactive command-line interfaces. + - Typer: For building CLI applications with ease. + - Tabulate: For formatting data in tabular form. + - Requests: For making HTTP requests to web APIs. + - python-dotenv: For loading environment variables from a .env file. +- **APIs**: + - Metaphor API: Provides semantic search capabilities for finding similar topics and technologies. + - Tavily API: Offers AI-powered web search functionality for conducting in-depth keyword research. + - SerperDev API: Enables access to search engine results and competitor analysis data. + - OpenAI API: Powers the Large Language Models (LLMs) for generating blog content and conducting research. + - Gemini API: Another LLM provider for natural language processing tasks. + - Ollama API (Work In Progress): An upcoming LLM provider for additional research and content generation capabilities. + +## Getting Started + +To use this tool, follow these steps: + +1. Clone this repository to your local machine. +2. Install the required dependencies using `pip install -r requirements.txt`. +3. Run the script by executing `python blogen.py`. +4. Set up the necessary API keys by following the instructions provided in the script and adding them to the `.env` file. --- Notes: diff --git a/blogen.py b/blogen.py index 418ee606..ba98fb6c 100644 --- a/blogen.py +++ b/blogen.py @@ -13,7 +13,8 @@ load_dotenv(Path('.env')) app = typer.Typer() from lib.ai_web_researcher.gpt_online_researcher import gpt_web_researcher - +from lib.ai_web_researcher.metaphor_basic_neural_web_search import metaphor_find_similar +from lib.ai_writers.keywords_to_blog import write_blog_from_keywords def prompt_for_time_range(): @@ -36,7 +37,8 @@ def write_blog_options(): 'type': 'list', 'name': 'blog_type', 'message': 'šŸ“ Choose a blog type:', - 'choices': ['Keywords', 'Audio YouTube', 'GitHub', 'Scholar', 'Quit'], + 'choices': ['Keywords', 'Audio YouTube', 'Programming', + 'Scholar', 'News/TBD','Finance/TBD', 'Quit'], } ] answers = prompt(questions) @@ -55,6 +57,7 @@ def start_interactive_mode(): text.append("\nāš ļø Alert! šŸ’„ā“šŸ’„\n", style="bold red") text.append("If you know what to write, choose 'Write Blog'\n", style="bold blue") text.append("If unsure, lets 'do web research' to write on\n", style="bold red") + text.append("If Testing-it-out/getting-started, choose 'Blog Tools\n", style="bold green") text.append("_______________________________________________________________________\n") print(text) @@ -64,28 +67,29 @@ def start_interactive_mode(): 'type': 'list', 'name': 'mode', 'message': 'Choose an option:', - 'choices': ['Write Blog', 'Do Web Research', 'Competitor Analysis', 'FAQ Generator', 'Quit'], + 'choices': ['Write Blog', 'Do keyword Research', 'Create Blog Images', + 'Competitor Analysis', 'Blog Tools', 'Quit'], } ] answers = prompt(questions) mode = answers['mode'] if mode == 'Write Blog': write_blog() - elif mode == 'Do Web Research': + elif mode == 'Do keyword Research': do_web_research() - elif mode == 'FAQ Generator': + elif mode == 'Create Blog Images': faq_generator() elif mode == 'Competitor Analysis': - # https://github.com/com-puter-tips/SEO-Analysis - # https://github.com/sundios/SEO-Lighthouse-Multiple-URLs - # https://github.com/Gingerbreadfork/Cutlery # Metaphor similar search competitor_analysis() - elif mode == 'News Analysis': + elif mode == 'Recent News Summarizer': print("""1. Get tavily News. 2. Get metaphor news. 3. Get from NewsApi 4. Get YOU.com News.""") + recent_news_summarizer() + elif mode == 'Blog Tools': + blog_tools() elif mode == 'Quit': typer.echo("Exiting, F*** Off!") raise typer.Exit() @@ -130,7 +134,7 @@ def check_environment_variables(): if missing_keys: print("\nMost are Free APIs and really worth your while signing up for them.") - print(":pile_of_poo::pile_of_poo::pile_of_poo: GO GET THEM, on above urls. [bold red]") + print(":pile_of_poo: :pile_of_poo: GO GET THEM, on above urls. [bold red]") print("Note: They offer free/limited api calls, so we use most of them to have a lot of free api calls.") print("\n[bold red]TBD: Provide option to use user defined search engines.\n") for key, description in missing_keys: @@ -138,11 +142,84 @@ def check_environment_variables(): else: return True + +def check_llm_environs(): + """ Function to check which LLM api is given. """ + gpt_provider = os.getenv("GPT_PROVIDER") + + if gpt_provider == "google": + api_key_var = "GEMINI_API_KEY" + missing_api_msg = f"To get your {api_key_var}, please visit: https://aistudio.google.com/app/apikey" + elif gpt_provider == "openai": + api_key_var = "OPENAI_API_KEY" + missing_api_msg = "To get your OpenAI API key, please visit: https://openai.com/blog/openai-api" + else: + typer.echo("Unsupported GPT provider specified in GPT_PROVIDER environment variable.") + return + + if os.getenv(api_key_var) is None: + typer.echo(f"The {api_key_var} environment variable is missing.") + typer.echo(missing_api_msg) + api_key = typer.prompt(f"Please enter your {api_key_var} API Key:") + # Update .env file + with open(".env", "a") as env_file: + env_file.write(f"{api_key_var}={api_key}\n") + typer.echo(f"{api_key_var} API Key added to .env file.") + return + + if gpt_provider == "openai" and os.getenv("OPENAI_API_KEY") is None: + typer.echo("To get your OpenAI API key, please visit: https://openai.com/blog/openai-api") + + def faq_generator(): return +def blog_tools(): + """ Blogging Aid Tools """ + os.system("clear" if os.name == "posix" else "cls") + text = Text() + text.append("_______________________________________________________________________") + text.append("\nāš ļø Alert! šŸ’„ā“šŸ’„\n", style="bold red") + text.append("Collection of Helpful Blogging Tools, powered by LLMs.\n", style="bold green") + text.append("_______________________________________________________________________\n") + + print(text) + + # https://developers.google.com/speed/docs/insights/v5/get-started + questions = [ + { + 'type': 'list', + 'name': 'mode', + 'message': 'Choose a Blogging Tool:', + 'choices': ['Write Blog Title', 'Write Blog Meta Description', 'Write Blog Introduction', + 'Write Blog conclusion', 'Write Blog Outline', 'Generate Blog FAQs', 'Research blog referances', + 'Convert Blog To HTML', 'Convert Blog To Markdown', 'Blog Proof Reader', + 'Get Blog Tags', 'Get blog categories', 'Get Blog Code Examples', 'Quit', + 'Check WebPage Performance',], + } + ] + answers = prompt(questions) + mode = answers['mode'] + if mode == 'Write Blog Title': + return + + def competitor_analysis(): + """ Do metaphor similar search """ + text = Text() + text.append("_______________________________________________________________________") + text.append("\nāš ļø Alert! šŸ’„ā“šŸ’„\n", style="bold red") + text.append("Provide competitor's URL, get details of similar/alternative companies.\n", style="bold red") + text.append("Usecases: Know similar companies and alternatives, to given URL\n", style="bold blue") + text.append("_______________________________________________________________________\n") + print(text) + similar_url = typer.prompt(f"Enter Valid URL to get web analysis") + + try: + metaphor_find_similar(similar_url) + except Exception as err: + print(f"[bold red]āœ– 🚫 Failed to do similar search.\nError:{err}[/bold red]") return @@ -153,8 +230,7 @@ def write_blog(): blog_type = write_blog_options() if blog_type == 'Keywords': - keywords = typer.prompt("Enter keywords for blog generation:") - print(f"Write blog based on keywords: {keywords}") + blog_from_keyword() elif blog_type == 'Audio YouTube': audio_youtube = typer.prompt("Enter YouTube URL for audio blog generation:") print(f"Write audio blog based on YouTube URL: {audio_youtube}") @@ -165,10 +241,18 @@ def write_blog(): scholar = typer.prompt("Enter research papers keywords:") print(f"Write blog based on scholar: {scholar}") elif blog_type == 'Quit': - typer.echo("Exiting, Fuck Off!") + typer.echo("Exiting, F*** Off!") raise typer.Exit() +def blog_from_keyword(): + """ Write blog from given keyword. """ + print("Write blog based on keywords.") + check_llm_environs() + keywords = typer.prompt("Enter 'keywords/Blog Title' for blog generation:") + final_blog = write_blog_from_keywords(keywords) + + def do_web_research(): """ Do Web Research option with time_range, search_keywords, and include_urls sub-options. diff --git a/lib/ai_web_researcher/ai_news_researcher.py b/lib/ai_web_researcher/ai_news_researcher.py new file mode 100644 index 00000000..f1dedc33 --- /dev/null +++ b/lib/ai_web_researcher/ai_news_researcher.py @@ -0,0 +1,172 @@ +################################################################ +# +# +# +############################################################## + +import os +import json +from pathlib import Path +import sys +from typing import List, NamedTuple +from loguru import logger +from datetime import datetime + +from ..gpt_providers.gemini_pro_text import gemini_text_response +from .tavily_ai_search import get_tavilyai_results +from .metaphor_basic_neural_web_search import metaphor_news_summarizer +from .google_serp_search import google_news +from .google_trends_researcher import do_google_trends_analysis +from .gpt_blog_sections import get_blog_sections_from_websearch +from .web_research_report import write_web_research_report + + +# Configure logger +logger.remove() +logger.add(sys.stdout, + colorize=True, + format="{level}|{file}:{line}:{function}| {message}" + ) + + +def web_news_researcher(search_keywords, time_range=None, include_domains=list(), similar_url=None): + """ """ + print(f"Web Research:Time Range - {time_range},Search Keywords - {search_keywords},Include URLs - {include_domains}") + if not include_domains: + include_domains = list() + # TBD: Keeping the results directory as fixed, for now. + os.environ["SEARCH_SAVE_FILE"] = os.path.join(os.getcwd(), "workspace", "web_research_reports", + search_keywords.replace(" ", "_") + "_" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S")) + + # Collect all blog titles featuring in search results. This *may help in generating blog titles + # closest to competing ones. All search blog titles, given keyword and keywords from analysis, give + # llm a good context for the task of generating blog titles. + blog_titles = [] + # Get a list of FAQs from search results. + blog_faqs = None + google_result = None + tavily_result = None + report = None + try: + logger.info(f"Doing Google search for: {search_keywords}\n") + google_result = google_search(search_keywords) + blog_titles.append(extract_info(google_result, "titles")) + except Exception as err: + logger.error(f"Failed to do Google Serpapi research: {err}") + # Not failing, as tavily would do same and then GPT-V to search. + + try: + # FIXME: Include the follow-up questions as blog FAQs. + logger.info(f"Doing Tavily AI search for: {search_keywords}") + tavily_result = get_tavilyai_results(search_keywords, include_domains) + blog_titles.append(tavily_extract_information(tavily_result, "titles")) + except Exception as err: + logger.error(f"Failed to do Tavily AI Search: {err}") + + try: + logger.info(f"Start Semantic/Neural web search with Metahpor: {search_keywords}") + response_articles = metaphor_search_articles( + search_keywords, + include_domains=include_domains, + time_range=time_range, + similar_url=similar_url) + blog_titles.append(metaphor_extract_titles_or_text(response_articles, return_titles=True)) + except Exception as err: + logger.error(f"Failed to do Metaphor search: {err}") + print(blog_titles) + + try: + logger.info(f"Do Google Trends analysis for given keywords: {search_keywords}") + important_keywords = do_google_trends_analysis(search_keywords) + except Exception as err: + logger.error(f"Failed to do google trends analysis: {err}") + print(important_keywords) + # Now that we have search results from given keywords. Generate blog title and subtopics suggestions. + # 1. Return a list of related keywords along with search volumes. + # 2. New blog titles to write on(niche, top) and blog sections. + # 3. Competitors list, similar urls if given. + print(f"\n\nReview the analysis in this file at: {os.environ.get('SEARCH_SAVE_FILE')}\n") + + +def metaphor_extract_titles_or_text(json_data, return_titles=True): + """ + Extract either titles or text from the given JSON structure. + + Args: + json_data (list): List of Result objects in JSON format. + return_titles (bool): If True, return titles. If False, return text. + + Returns: + list: List of titles or text. + """ + if return_titles: + return [(result.title) for result in json_data] + else: + return [result.text for result in json_data] + + +def extract_info(json_data, info_type): + """ + Extract information (titles, peopleAlsoAsk, or relatedSearches) from the given JSON. + + Args: + json_data (dict): The JSON data. + info_type (str): The type of information to extract (titles, peopleAlsoAsk, relatedSearches). + + Returns: + list or None: A list containing the requested information, or None if the type is invalid. + """ + if info_type == "titles": + return [result.get("title") for result in json_data.get("organic", [])] + elif info_type == "peopleAlsoAsk": + return [item.get("question") for item in json_data.get("peopleAlsoAsk", [])] + elif info_type == "relatedSearches": + return [item.get("query") for item in json_data.get("relatedSearches", [])] + else: + print("Invalid info_type. Please use 'titles', 'peopleAlsoAsk', or 'relatedSearches'.") + return None + + +def tavily_extract_information(json_data, keyword): + """ + Extract information from the given JSON based on the specified keyword. + + Args: + json_data (dict): The JSON data. + keyword (str): The keyword (title, content, answer, follow-query). + + Returns: + list or str: The extracted information based on the keyword. + """ + if keyword == 'title': + return [result['title'] for result in json_data['results']] + elif keyword == 'content': + return [result['content'] for result in json_data['results']] + elif keyword == 'answer': + return json_data['answer'] + elif keyword == 'follow-query': + return json_data['follow_up_questions'] + else: + return f"Invalid keyword: {keyword}" + + +def compete_organic_results(query, report, organic_results): + """ Given a blog content and google search organinc results, create a new blog to compete against them.""" + prompt = f""" As an SEO expert and copywriter, I will provide you with my blog content on topic '{query}', and + Top google search results. + Your task is to rewrite the given blog to make it compete against top position results. + Make sure, the new blog has high probability of ranking highest against given organic search result competitors. + Modify the given blog content following best SEO practises. + Make sure the blog is original, unique and highly readable. + Remember, Maintain and adopt the formatting, structure, style and tone of the provided blog content. + Include relevant emojis in your final blog for visual appeal. Use it sparingly. + Your response should be well-structured, objective, and critically acclaimed blog article based on provided texts. + + Remember, your goal is to create a detailed blog article that will compete against given organic result competitors. + Do not provide explanations, suggestions for your response, reply only with your final response. + Take your time in crafting your content, do not rush to give the response. + Blog Content: '{report}'\n + Organic Search result: '{organic_results}' + """ + report = gemini_text_response(prompt) + return report diff --git a/lib/ai_web_researcher/google_serp_search.py b/lib/ai_web_researcher/google_serp_search.py index eb214e2b..bcd6228a 100644 --- a/lib/ai_web_researcher/google_serp_search.py +++ b/lib/ai_web_researcher/google_serp_search.py @@ -37,7 +37,7 @@ from clint.textui import progress #from serpapi import GoogleSearch from loguru import logger from tabulate import tabulate - +from GoogleNews import GoogleNews # Configure logger logger.remove() from dotenv import load_dotenv @@ -49,7 +49,6 @@ logger.add( format="{level}|{file}:{line}:{function}| {message}" ) -from .gpt_titles_faq import gpt_titles_faqs_google_search #from tenacity import retry, stop_after_attempt, wait_random_exponential #@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6)) @@ -199,6 +198,15 @@ def perform_dataforseo_google_search(): return +def google_news(search_keywords, news_period="7d", region="IN"): + """ Get news articles from google_news""" + googlenews = GoogleNews() + googlenews.enableException(True) + googlenews = GoogleNews(lang='en', region=region) + googlenews = GoogleNews(period=news_period) + print(googlenews.get_news('APPLE')) + print(googlenews.search('APPLE')) + def process_search_results(search_results): """ diff --git a/lib/ai_web_researcher/gpt_online_researcher.py b/lib/ai_web_researcher/gpt_online_researcher.py index c13e0c91..2a015a3e 100644 --- a/lib/ai_web_researcher/gpt_online_researcher.py +++ b/lib/ai_web_researcher/gpt_online_researcher.py @@ -17,10 +17,8 @@ from .tavily_ai_search import get_tavilyai_results from .metaphor_basic_neural_web_search import metaphor_find_similar, metaphor_search_articles from .google_serp_search import google_search from .google_trends_researcher import do_google_trends_analysis -from .gpt_blog_sections import get_blog_sections_from_websearch from .web_research_report import write_web_research_report - # Configure logger logger.remove() logger.add(sys.stdout, @@ -32,60 +30,63 @@ logger.add(sys.stdout, def gpt_web_researcher(search_keywords, time_range=None, include_domains=list(), similar_url=None): """ """ print(f"Web Research:Time Range - {time_range},Search Keywords - {search_keywords},Include URLs - {include_domains}") + # TBD: Keeping the results directory as fixed, for now. + os.environ["SEARCH_SAVE_FILE"] = os.path.join(os.getcwd(), "workspace", "web_research_reports", search_keywords.replace(" ", "_") + "_" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S")) if not include_domains: include_domains = list() - # TBD: Keeping the results directory as fixed, for now. - os.environ["SEARCH_SAVE_FILE"] = os.path.join(os.getcwd(), "workspace", "web_research_reports", - search_keywords.replace(" ", "_") + "_" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S")) - # Collect all blog titles featuring in search results. This *may help in generating blog titles - # closest to competing ones. All search blog titles, given keyword and keywords from analysis, give - # llm a good context for the task of generating blog titles. - blog_titles = [] - # Get a list of FAQs from search results. - blog_faqs = None - google_result = None - tavily_result = None - report = None + google_search_result = do_google_serp_search(search_keywords) + tavily_search_result = do_tavily_ai_search(search_keywords, include_domains) + metaphor_search_result = do_metaphor_ai_research(search_keywords, include_domains, time_range, similar_url) + gtrends_search_result = do_google_pytrends_analysis(search_keywords) + # get_rag_results(search_query) + print(f"\n\nReview the analysis in this file at: {os.environ.get('SEARCH_SAVE_FILE')}\n") + + +def do_google_serp_search(search_keywords): + """ """ try: logger.info(f"Doing Google search for: {search_keywords}\n") - google_result = google_search(search_keywords) - blog_titles.append(extract_info(google_result, "titles")) + return(google_search(search_keywords)) except Exception as err: logger.error(f"Failed to do Google Serpapi research: {err}") # Not failing, as tavily would do same and then GPT-V to search. + +def do_tavily_ai_search(search_keywords, include_domains=None): + """ """ try: # FIXME: Include the follow-up questions as blog FAQs. logger.info(f"Doing Tavily AI search for: {search_keywords}") - tavily_result = get_tavilyai_results(search_keywords, include_domains) - blog_titles.append(tavily_extract_information(tavily_result, "titles")) + return(get_tavilyai_results(search_keywords, include_domains)) except Exception as err: logger.error(f"Failed to do Tavily AI Search: {err}") + +def do_metaphor_ai_research(search_keywords, + include_domains=None, + time_range=None, + similar_url=None): + """ """ try: logger.info(f"Start Semantic/Neural web search with Metahpor: {search_keywords}") response_articles = metaphor_search_articles( - search_keywords, - include_domains=include_domains, + search_keywords, + include_domains=include_domains, time_range=time_range, similar_url=similar_url) - blog_titles.append(metaphor_extract_titles_or_text(response_articles, return_titles=True)) + return response_articles except Exception as err: logger.error(f"Failed to do Metaphor search: {err}") - print(blog_titles) + +def do_google_pytrends_analysis(search_keywords): + """ """ try: logger.info(f"Do Google Trends analysis for given keywords: {search_keywords}") - important_keywords = do_google_trends_analysis(search_keywords) + return(do_google_trends_analysis(search_keywords)) except Exception as err: logger.error(f"Failed to do google trends analysis: {err}") - print(important_keywords) - # Now that we have search results from given keywords. Generate blog title and subtopics suggestions. - # 1. Return a list of related keywords along with search volumes. - # 2. New blog titles to write on(niche, top) and blog sections. - # 3. Competitors list, similar urls if given. - print(f"\n\nReview the analysis in this file at: {os.environ.get('SEARCH_SAVE_FILE')}\n") def metaphor_extract_titles_or_text(json_data, return_titles=True): diff --git a/lib/ai_web_researcher/metaphor_basic_neural_web_search.py b/lib/ai_web_researcher/metaphor_basic_neural_web_search.py index d8cfa772..38ad6003 100644 --- a/lib/ai_web_researcher/metaphor_basic_neural_web_search.py +++ b/lib/ai_web_researcher/metaphor_basic_neural_web_search.py @@ -70,7 +70,10 @@ def metaphor_find_similar(similar_url): raise competitors = search_response.results - for acompetitor in tqdm(competitors, desc="Processing Competitors", unit="competitor"): + urls = {} + for c in competitors: + print(c.title + ':' + c.url) + for acompetitor in tqdm(competitors, desc="Processing URL content", unit="competitor"): all_contents = "" try: search_response = metaphor.search_and_contents( @@ -82,16 +85,15 @@ def metaphor_find_similar(similar_url): logger.error(f"Failed to do metaphor keyword/url research: {err}") research_response = search_response.results - # Add a progress bar for the inner loop for r in tqdm(research_response, desc=f"{acompetitor.url}", unit="research"): all_contents += r.text - try: - acompetitor.text = summarize_competitor_content(all_contents, "gemini") - except Exception as err: - logger.error(f"Failed to summarize_web_content: {err}") + try: + acompetitor.text = summarize_competitor_content(all_contents, "gemini") + except Exception as err: + logger.error(f"Failed to summarize_web_content: {err}") - # Convert the data into a list of lists + print(competitors) print_search_result(competitors) return search_response @@ -142,7 +144,6 @@ def metaphor_search_articles(query, logger.error(f"Failed in metaphor.search_and_contents: {err}") # From each webpage, get a summary of the web page. - print(search_response) contents_response = search_response.results # for content in tqdm(contents_response, desc="Reading Web URL content:", unit="content"): # summarized_content = summarize_web_content(content.text, "gemini") @@ -160,18 +161,37 @@ def metaphor_search_articles(query, raise + +def metaphor_news_summarizer(news_keywords): + """ build a LLM-based news summarizer app with the Exa API to keep us up-to-date + with the latest news on a given topic. + """ + # FIXME: Needs to be user defined. + one_week_ago = (datetime.now() - timedelta(days=7)) + date_cutoff = one_week_ago.strftime("%Y-%m-%d") + + search_response = exa.search_and_contents( + news_keywords, use_autoprompt=True, start_published_date=date_cutoff + ) + + urls = [result.url for result in search_response.results] + print("URLs:") + for url in urls: + print(url) + + def print_search_result(contents_response): # Define the Result namedtuple - Result = namedtuple("Result", ["url", "title", "published_date", "text"]) + Result = namedtuple("Result", ["url", "title", "text"]) # Tabulate the data - table_headers = ["URL", "Title", "Published Date", "Summary"] - table_data = [(result.url, result.title, result.published_date, result.text) for result in contents_response] + table_headers = ["URL", "Title", "Summary"] + table_data = [(result.url, result.title, result.text) for result in contents_response] table = tabulate(table_data, headers=table_headers, tablefmt="fancy_grid", - colalign=["left", "left", "left", "left"], - maxcolwidths=[20, 20, 10, 60]) + colalign=["left", "left", "left"], + maxcolwidths=[20, 20, 70]) print(table) # Save the combined table to a file try: diff --git a/lib/ai_web_researcher/tavily_ai_search.py b/lib/ai_web_researcher/tavily_ai_search.py index 42a8155c..401b6365 100644 --- a/lib/ai_web_researcher/tavily_ai_search.py +++ b/lib/ai_web_researcher/tavily_ai_search.py @@ -46,7 +46,6 @@ logger.add(sys.stdout, ) from tenacity import retry, stop_after_attempt, wait_random_exponential -from .gpt_titles_faq import gpt_titles_faqs_google_search @retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6)) def get_tavilyai_results(keywords, include_urls, search_depth="advanced"): diff --git a/lib/ai_web_researcher/you_web_reseacher.py b/lib/ai_web_researcher/you_web_reseacher.py index 207cbd6b..d685f796 100644 --- a/lib/ai_web_researcher/you_web_reseacher.py +++ b/lib/ai_web_researcher/you_web_reseacher.py @@ -1,10 +1,14 @@ +import os + import requests from clint.textui import progress from loguru import logger +from pathlib import Path +from dotenv import load_dotenv +load_dotenv(Path('../../.env')) - -def search_ydc_index(search_query, num_web_results=10, country="IN", api_key=""): +def search_ydc_index(search_query, num_web_results=10, country="IN"): """ Search YDC Index API and retrieve results. @@ -17,24 +21,20 @@ def search_ydc_index(search_query, num_web_results=10, country="IN", api_key="