ð¥ðü建造一个AI驱动的Discord Bot，以推荐使用OpenAI，Novu和Agentað的Hackernews帖子-DEV365 开发者社区

tl; dr

在本教程中，您将学习如何创建AI代理，该AI代理会提醒您有关您兴趣的相关黑客新闻帖子。每当帖子符合您的标准时，代理就会发送不和谐通知。

我们将使用Python编写代码，使用BeautifulSoup进行网络刮擦，将OpenAI与Agenta一起构建AI，然后使用Novu进行松弛通知。

目标？黑客新闻不再有无尽的滚动。让AI为您带来有价值的帖子！

Agenta：开源LLM应用程序构建器ðÖ

有点关于我们：Agenta是端到端开源LLM应用程序构建器。它使您能够快速构建，实验，评估和部署LLM应用程序为API。您可以通过在Langchain或任何其他框架中或直接从UI编写代码来使用它。

这是计划ð

我们将编写一个执行以下操作的脚本：

使用Python和Beautifulsoup刮擦前五个黑客新闻页面。
使用Agenta和GPT3.5根据您的兴趣对帖子进行分类。
将引人注目的帖子发送到您的Slack频道。

设置内容：

要开始，让我们创建一个项目文件夹和运行诗歌。如果您不熟悉诗歌，则应该检查一下，因为它为虚拟环境提供了更容易使用的替代方案。

mkdir hnbot; cd hnbot/
poetry init

This command will guide you through creating your pyproject.toml config.


Would you like to define your main dependencies interactively? (yes/no) [yes] yes

Package to add or search for (leave blank to skip): novu
Add a package (leave blank to skip): beautifulsoup4


Do you confirm generation? (yes/no) [yes] yes

按照提示设置您的项目。不要忘记安装novu和Beautifulsoup4。

现在，让我们为我们的软件包创建文件夹并初始化诗歌环境

% mkdir hn_bot; cd hn_bot
% cd hn_bot 
% poetry shell
(hn-bot-py3.9) (base) % poetry install

现在，我们有一个本地环境：

我们所有的要求均已安装。
我们有一个python软件包，称为hn_bot，它在我们的python lib中。

这意味着，如果我们的库中有多个文件，我们可以使用import hn_bot.module_name导入它们。

抓取黑客新闻报道

刮擦黑客新闻页面很简单，因为它不使用任何复杂的JavaScript。这些页面位于https://news.ycombinator.com/?p=pagenumber。

要在页面上找到标题和链接，我们只需要打开Web浏览器并访问Dev Console即可。到达那里后，我们可以检查是否可以使用任何元素来找到标题和链接。幸运的是，似乎每个帖子都是“ titleline”类的跨度。

我们可以使用它从单个页面提取信息。让我们写一个函数，从hacker新闻中提取标题和链接。

# hn_scraper.py
from typing import Dict, List

import requests
from bs4 import BeautifulSoup

def scrape_page(page_number: str) -> List[Dict[str, str]]:
    response = requests.get(f"https://news.ycombinator.com/news?p={page_number}")
    yc_web_page = response.text
    soup = BeautifulSoup(yc_web_page, 'html.parser')

    articles = []

    for article_tag in soup.find_all(name="span", class_="titleline"):
        title = article_tag.getText()
        link = article_tag.find("a")["href"]
        articles.append({"title": title, "link": link})

    return articles

我们可以通过在脚本的末尾添加print(scrape_page(1))并在shell上运行它来测试它：

 % python hn_scraper.py
(['Linux Network Performance Parameters Explained (github.com/leandromoreira)', 'Double Commander – Changes in version 1.1.0 (github.com/doublecmd)', 'If You’ve Got a New Car, It’s a Data Privacy Nightmare (gizmodo.com)', 'Ask HN: I’m an FCC Commissioner proposing regulation of IoT security updates', 'Gcsfuse: A user-space file system for interacting with Google Cloud S

恭喜！

创建AI代理ðÖ

现在我们有了帖子列表，我们需要使用OpenAI GPT模型来根据用户的兴趣对它们是否相关。为此，我们将使用Agenta。

Agenta允许您从代码或UI创建LLM应用程序。由于今天的LLM应用非常简单，因此我们将从UI创建它。

代理可以是self-hosted11，但是要快速开始使用demo.agenta.ai。

由于今天的LLM应用非常简单，我们将继续从UI创建它。

您可以自我主机代理（在此处查看文档（https://docs.agenta.ai/installation/local-installation/local-installation）或使用云托管演示。要快速开始，我们将稍后进行。

让我们去demo.agenta.ai并登录。

首先，让我们单击“创建新应用”来创建一个新应用。

然后我们从模板中选择“开始”

并使用单个提示模板

在代理商中进行一些及时的工程

现在我们有一个用于创建应用程序的操场。

首先，让我们为应用程序添加输入。在这种情况下，我们将使用“标题”作为黑客新闻标题和“兴趣”来获得用户的兴趣。

接下来，我们需要进行一些及时的工程。由于我们正在使用GPT3.5（Openai中最便宜的变体）。它采用两个消息：系统消息和用户消息。我们可以使用系统消息来指导语言模型以某种方式回答，而提示提示人类提供了任务的参数。

在这种情况下，我尝试了一个简单的提示，以确保答案为“ true”或“ false”的系统。对于人类的提示，我只是要求系统进行分类。请注意，我们使用FSTRING常规格式将已添加到提示的输入注入。

现在，我们可以用黑客新闻标题的一些示例测试应用程序：

Agenta提供了系统评估应用程序并优化提示，参数和工作流程的工具（如果我们使用的是更复杂的嵌入式和检索增强后代）。但是，在这种情况下，这种评估是不必要的。该应用程序本身非常简单，GPT3.5能够以最小的努力解决分类问题。

让我们保存我们的更改

然后将应用程序部署为API。

为此，我们跳到端点菜单，然后将代码段复制到我们的代码。

包裹

现在，我们可以基于此代码段创建功能。

# llm_classifier.py
import requests
import json

def classify_post(title: str, interests: str) -> bool:

    url = "https://demo.agenta.ai/64f1d1aefeebd024bbdb1ea4/hn_bot/v1/generate"
    params = {
        "inputs": {
            "title": title,
            "interests": interests
        },
        "temperature": 0,
        "model": "gpt-3.5-turbo",
        "maximum_length": 100,
        "prompt_system": "You are an expert in classification. You answer only with True or False.",
        "prompt_human": "Classify whether this hackernews post is interesting for someone with the following interests:\nHacker news post title: {title}\nInterests: {interests}",
        "stop_sequence": "\n",
        "top_p": 1,
        "frequence_penalty": 0,
        "presence_penalty": 0
    }

    response = requests.post(url, json=params)

    data = response.json()

    return bool(data)

发送不和谐消息ð®

首先，我们需要在Discord中创建一个新频道

接下来，我们需要创建一个Webhook并复制URL

现在我们需要在Novu中设置集成。为此，我们必须去Integration Store，单击添加提供商，选择Discord，然后不要忘记激活它！

最后，我们需要创建一个工作流程，以触发要发送到我们的不和谐的消息。我们将将{{content}}变量添加到消息中，我们以后将使用代码注入。

编写消息传递功能

现在是时候写触发工作流的消息了

# novu_bot
from novu.config import NovuConfig
from novu.api import EventApi
from novu.api.subscriber import SubscriberApi
from novu.dto.subscriber import SubscriberDto

NovuConfig().configure("https://api.novu.co", "YOUR_API_KEY")
webhook_url = "..." # the webhook url we got from Discord

def send_message(msg):
    your_subscriber_id = "123"  # Replace this with a unique user ID.

    # Define a subscriber instance
    subscriber = SubscriberDto(
        subscriber_id=your_subscriber_id,
        email="abc@gmail.com",
        first_name="John",
        last_name="Doe"
    )

    SubscriberApi().create(subscriber)
    SubscriberApi().credentials(subscriber_id=your_subscriber_id,
                                provider_id="discord",

    EventApi().trigger(
        name="slackbot",  # The trigger ID of the workflow. It can be found on the workflow page.
        recipients=your_subscriber_id,
        payload={},  # Your Novu payload goes here
    )

将所有东西放在一起

现在，我们准备组装所有元素以使我们的AI助手运行。

让我们创建一个app.py文件，在该文件中我们首先调用刮板，然后调用LLM分类器，最后发送带有有趣帖子的消息。

from hn_bot import hn_scraper, llm_classifier, novu_bot
import schedule
import time

interests = "LLMs, LLMOps, Python, Infrastructure, Tennis, MLOps, Data science, AI, startups, Computational Biology"

def main():
    novu_bot.send_message("Interesting posts at HackerNews:\n")

    posts = hn_scraper.scrape_page("1")
    for title, url in posts:
        if llm_classifier.classify_post(title, interests) == "True":
            novu_bot.send_message(f"{title}\n{url}")

if __name__ == "__main__":
    main()

Et Voila，我们的不和谐中都有所有有趣的帖子。

最后，让我们安排以每小时运行

我们想运行脚本以每小时检查新帖子。为此，我们需要添加Python库时间表

from hn_bot import hn_scraper, llm_classifier, novu_bot
from time import sleep
interests = "LLM, LLMOps, MLOps, Data science, AI, startups"

done_post_titles = []


def main():
    novu_bot.send_message("Interesting posts at HackerNews:\n")

    posts = []
    for i in range(1, 5):
        posts += hn_scraper.scrape_page(i)
    for post in posts:
        title = post["title"]
        url = post["link"]
        if llm_classifier.classify_post(title, interests) and title not in done_post_titles:
            done_post_titles.append(title)
            novu_bot.send_message(f"{title}\n{url}")


if __name__ == "__main__":
    while True:
        main()
        sleep(3600)

恭喜到目前为止！

摘要ð

在本教程中，我们建立了一个由AI驱动的助手，可以在相关的黑客新闻帖子中让您陷入困境。您应该学会：

如何使用Beautifulsoup刮除Hackernews
如何使用Agenta和OpenAI GPT3.5
如何使用novu

您可以在此https://github.com/Agenta-AI/blog/tree/main/hackernews-bot

上检查代码

感谢您的阅读！