在流中环境中使用语言模型来了解金融市场
#python #machinelearning #llm #datastreaming

对于那些渴望潜入代码的人,它可用:

GitHub logo bytewax / news-analyzer

通过机器学习实时分析财务新闻




新闻的有效分析对于理解世界至关重要,尤其是在金融市场方面。能够迅速确定重大事件,例如被黑客入侵的主要公司和敏感的客户数据,可以使您能够迅速做出反应,并利用机会或最小化损失。在这篇博客文章中,我们将深入研究Bytewax和大型语言模型如何实时分析财务新闻,从而为您提供更有效地响应突发新闻的能力。我们需要至少回答三个问题,以成功实施我们的小项目:

  • 我们从哪里获取数据?
  • 我们如何分析它?
  • 我们如何访问数据源并实时执行分析?

数据源

对于此演示中使用的数据源,我们将使用Alpaca news API,该Alpaca news API可提供Websocket从Benzinga访问新闻文章。要设置帐户并创建一个API密钥和秘密,您可以关注Alpaca documentation您可以将任何Websocket用作数据源。将来的后续行动将研究我们如何构建自己的实时新闻聚合管道进行分析。

内容分析

显然,我们将利用大型语言模型(LLM)来分析新闻文章。当寻找LLM时,我想到的最好的地方是拥抱脸。Hugging Face是一家提供市场的公司他们的变压器库。首先,我们需要在标题上进行情感分析,这可以快速提供宝贵的见解。为此,我们将使用称为FinancialBERT的微型BERT模型。然后,我们将总结文章的内容,并为此提供了微调的BART model。两者都可以在huggingface.co上找到。我们还将介绍如何使用变形金刚库运行模型。

使用BYTEWAX实时数据处理

如果您不熟悉Bytewax。 Bytewax是一种状态流处理器,可用于实时分析数据,并支持窗口和聚合等状态运营商。 Bytewax特别适合利用工具生态系统的工作流程,从Pandas(例如PANDAS)到以机器学习为中心的工具(例如拥抱面部变压器)。它还支持各种数据源,包括WebSocket。

让我们开始实时分析新闻。第一件事首先!依赖性:

!pip install bytewax transformers torch sentencepiece websocket-client

构建我们的数据流

a bytewax dataflow是一系列步骤,可将数据从输入源转换,然后将其写入输出。在每个步骤中,操作员都用于控制数据流。是否应过滤,聚合或累积。开发人员编写数据流将编写Python代码,该代码将在每个步骤进行数据转换。

输入

要开始数据流,我们将使用羊驼Websocket创建一个输入,我们将使用该输入来订阅多个股票上的文章。重要的是要注意,您需要一个羊驼API键和秘密,建议将它们存储为环境变量。

import os
import json

from bytewax.dataflow import Dataflow
from bytewax.inputs import ManualInputConfig, distribute

from websocket import create_connection

API_KEY = os.getenv("API_KEY")
API_SECRET = os.getenv("API_SECRET")

ticker_list = ["*"]

def input_builder(worker_index, worker_count, resume_state):
    state = resume_state or None
    worker_tickers = list(distribute(ticker_list, worker_index, worker_count))
    print({"subscribing to": worker_tickers})

    def news_input(worker_tickers, state):
        ws = create_connection("wss://stream.data.alpaca.markets/v1beta1/news")
        print(ws.recv())
        ws.send(json.dumps({"action":"auth","key":f"{API_KEY}","secret":f"{API_SECRET}"}))
        print(ws.recv())
        ws.send(json.dumps({"action":"subscribe","news":worker_tickers}))
        print(ws.recv())

        while True:
        # to use without API uncomment the below line and comment the one below that
        # articles = [{"T":"n","id":31248067,"headline":"Tesla Vehicles Could Be Banned From Leaving During A Hurricane In This State","summary":"A lawmaker in one American state could make it hard for owners of electric vehicles to get out of the state in the event of a hurricane. Here's the potential law and why it's important.","author":"Chris Katje","created_at":"2023-03-07T22:58:40Z","updated_at":"2023-03-07T22:58:40Z","url":"https://www.benzinga.com/news/23/03/31248067/tesla-vehicles-could-be-banned-from-leaving-during-a-hurricane-in-this-state","content":"\u003cp\u003eA lawmaker in one American state could make it hard for owners of electric vehicles to get out of the state in the event of a hurricane. Here\u0026rsquo;s the potential law and why it\u0026rsquo;s important.\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cstrong\u003eWhat Happened:\u003c/strong\u003e States have passed laws aimed at banning the sale of gas-powered vehicles in the future. One state took it a step further by seeking to ban electric vehicle \u003ca href=\"https://www.benzinga.com/news/23/01/30424292/taking-on-elon-musk-this-state-legislature-could-ban-electric-vehicle-sales-by-2035\"\u003esales in the future.\u003c/a\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003eOne of the leading states for electric vehicle purchases could now see a temporary ban on using electric vehicles during the time of a crisis.\u003c/p\u003e\r\n\r\n\u003cp\u003eFlorida Republican state Sen.\u0026nbsp;\u003cstrong\u003eJonathan Martin\u003c/strong\u003e is considering legislation to ban electric vehicles like those from \u003cstrong\u003eTesla Inc\u003c/strong\u003e (NASDAQ:\u003ca class=\"ticker\" href=\"https://www.benzinga.com/stock/TSLA#NASDAQ\"\u003eTSLA\u003c/a\u003e) to be used during hurricane evacuations in the state, according to \u003ca href=\"https://electrek.co/2023/03/06/florida-lawmaker-wants-to-ban-evs-from-hurricane-evacuations/\"\u003eElectrek\u003c/a\u003e.\u0026nbsp;\u003c/p\u003e\r\n\r\n\u003cp\u003eMartin told the state\u0026rsquo;s Department of Transportation that electric vehicles could block traffic during evacuations if they run out of battery charge.\u003c/p\u003e\r\n\r\n\u003cp\u003eMartin serves on the Committee on Environment and Natural Resources and the Select Committee on Resiliency.\u003c/p\u003e\r\n\r\n\u003cp\u003eThe Select Committee on Resiliency met with the Florida Department of Transportation executive director of transportation technologies in Florida.\u003c/p\u003e\r\n\r\n\u003cp\u003eAmong the topics discussed were the $198 million the state is going to get from the Bipartisan Infrastructure Law for electric vehicle charging infrastructure from the current administration led by \u003cstrong\u003ePresident Joe Biden.\u003c/strong\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003eThe legislation requires electric vehicle charging stations to be 50 miles apart and serve all electric vehicles.\u003c/p\u003e\r\n\r\n\u003cp\u003e\u0026ldquo;With a couple of guys behind you, you can\u0026rsquo;t get out of the car and push it to the side of the road. Traffic backs up. And what might look like a two-hour trip might turn into an eight-hour trip once you\u0026rsquo;re on the road,\u0026rdquo; Martin said.\u003c/p\u003e\r\n\r\n\u003cp\u003eMartin said his concern is with the electric vehicle infrastructure available in the state of Florida.\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cem\u003eRelated Link: \u003ca href=\"https://www.benzinga.com/trading-ideas/22/06/27568560/4-stocks-to-watch-this-hurricane-season\"\u003e4 Stocks To Watch This Hurricane Season\u0026nbsp;\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cstrong\u003eWhy It\u0026rsquo;s Important:\u003c/strong\u003e The Florida Department of Transportation told Martin it isn\u0026rsquo;t a fan of banning electric vehicles during hurricane evacuations and that it is looking into portable EV chargers.\u003c/p\u003e\r\n\r\n\u003cp\u003e\u0026ldquo;We have our emergency assistance vehicles that we deploy during a hurricane evacuation that have gas \u0026hellip; we need to provide that same level of service to electrical vehicles,\u0026rdquo; Department of Transportation director of transportation technologies \u003cstrong\u003eTrey Tillander \u003c/strong\u003esaid.\u003c/p\u003e\r\n\r\n\u003cp\u003eThe Tampa Bay Times \u003ca href=\"https://www.tampabay.com/hurricane/2023/02/24/florida-lawmaker-suggests-limiting-electric-vehicles-during-hurricane-evacuations/\"\u003ereported\u003c/a\u003e\u0026nbsp;around 1% of the vehicles in Florida are electric vehicles. One of the owners of an EV is state Sen.\u0026nbsp;\u003cstrong\u003eTina Polsky.\u003c/strong\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003e\u0026ldquo;I don\u0026rsquo;t think you can ban an electric vehicle from evacuating because that may be the only car someone has,\u0026rdquo; Polsky said.\u003c/p\u003e\r\n\r\n\u003cp\u003eIn December 2022, there were 203,094 electric vehicles registered in the state of Florida.\u003c/p\u003e\r\n\r\n\u003cp\u003eThe increased funding for charging infrastructure could help ease concerns over charging.\u003c/p\u003e\r\n\r\n\u003cp\u003eUltimately, once people are on the road headed out of the state, they likely won\u0026rsquo;t be able to stop at a charging station, similar to people not being able to quickly stop at a gas station.\u003c/p\u003e\r\n\r\n\u003cp\u003eJust like people prepare for the evacuation by filling up their vehicle with gas, owners of electric vehicles will likely need to fully charge their vehicle before evacuating the state.\u003c/p\u003e\r\n\r\n\u003cp\u003eThe comments from the state senator may have Florida residents thinking about owning at least one non-electric vehicle or a hybrid to ensure they have the best chance to exit the state without future restrictions and without the potential of running out of charge and not finding stations prevalent.\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cem\u003eRead Next:\u0026nbsp;\u003ca href=\"https://www.benzinga.com/analyst-ratings/analyst-color/23/03/31172188/tesla-analysts-praise-vertical-integration-after-investor-day-but-want-more-from-el\"\u003eTesla Analysts Praise Vertical Integration After Investor Day, But Want More From Elon Musk: \u0026#39;Long On Vision, Short On Specifics\u0026#39;\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cem\u003ePhoto:\u0026nbsp;\u003ca href=\"https://www.shutterstock.com/g/hsaduraphotos\"\u003eHenryk Sadura\u003c/a\u003e\u0026nbsp;via Shutterstock\u003c/em\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cbr /\u003e\r\n\u0026nbsp;\u003c/p\u003e\r\n","symbols":["TSLA"],"source":"benzinga"}]
          articles = json.loads(ws.recv())
          for article in articles:
            yield state, (article["source"], article)

    return news_input(worker_tickers, state)

flow = Dataflow()
flow.input("inp", ManualInputConfig(input_builder))

从新闻API返回的结果数据看起来像是此处显示的JSON。

[{"T":"n","id":31248067,"headline":"Tesla Vehicles Could Be Banned From Leaving During A Hurricane In This State","summary":"A lawmaker in one American state could make it hard for owners of electric vehicles to get out of the state in the event of a hurricane. Here's the potential law and why it's important.","author":"Chris Katje","created_at":"2023-03-07T22:58:40Z","updated_at":"2023-03-07T22:58:40Z","url":"https://www.benzinga.com/news/23/03/31248067/tesla-vehicles-could-be-banned-from-leaving-during-a-hurricane-in-this-state","content":"\u003cp\u003eA lawmaker in one American state could make it hard for owners of electric vehicles ... ertical Integration After Investor Day, But Want More From Elon Musk: \u0026#39;Long On Vision, Short On Specifics\u0026#39;\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cem\u003ePhoto:\u0026nbsp;\u003ca href=\"https://www.shutterstock.com/g/hsaduraphotos\"\u003eHenryk Sadura\u003c/a\u003e\u0026nbsp;via Shutterstock\u003c/em\u003e\u003c/p\u003e\r\n\r\n\u003cp\u003e\u003cbr /\u003e\r\n\u0026nbsp;\u003c/p\u003e\r\n","symbols":["TSLA"],"source":"benzinga"}]

我们将在数据流的下一步中使用它来分析情感并提供摘要。

管理重复和更新

在使用RSS/Atom feeds或News API的新闻报道时,通常会在创建然后更新时收到重复项。为了防止对这些重复的分析多次分析并在同一故事中产生了运行ML模型的其他开销,我们将使用Bytewax操作员koude0创建简化的存储层。我们将为我们遇到的每篇新闻文章存储唯一标识符的列表。如果以前看过文章,我们将其标记为更新。否则,我们将将文章的ID添加到状态对象中。为了滤除更新并避免重新分类并总结它们,我们将使用koude1操作员。将此过程视为检查唯一ID的数据库的等效物。

def update_articles(articles, news):
    if news['id'] in articles:
        news['update'] = True
    else:
        articles.append(news['id'])
        news['update'] = False
    return articles, news

flow.stateful_map("source_articles", lambda: list(), update_articles)

flow.filter(lambda x: not x[1]['update'])

情感分析

情感分析是我们过程的下一步。我们的方法涉及使用微调的拥抱脸模型来分析文章的标题情绪。我们将利用BERT模型为此目的。 Bert代表Transformers的双向编码器表示,是由Google开发的。为了详细了解该模型的运作方式和培训,您可以在拥抱面孔或随附的research paper上参考model card。由于我们想独立分析每篇新闻文章,因此情感分类将在koude2运营商中进行。尽管设计了新颖的模型体系结构和创建培训数据集,但实施情感分析非常简单。 请注意,如果您在笔记本中进行关注,则该模型最初将需要一些时间才能下载。

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline, AutoModelForSeq2SeqLM

sent_tokenizer = AutoTokenizer.from_pretrained("ahmedrachid/FinancialBERT-Sentiment-Analysis")
sent_model = AutoModelForSequenceClassification.from_pretrained("ahmedrachid/FinancialBERT-Sentiment-Analysis")
sent_nlp = pipeline("sentiment-analysis", model=sent_model, tokenizer=sent_tokenizer)

def sentiment_analysis(ticker__news):
    ticker, news = ticker__news
    sentiment = sent_nlp([news["headline"]])
    news['sentiment'] = sentiment[0]
    print(sentiment[0])
    return (ticker, news)

flow.map(sentiment_analysis)

文章摘要

分析了文章情绪后,我们将利用BART(双向自动回归变压器)模型体系结构,该体系结构是Google的Bert和OpenAI的GPT架构的组合来总结其内容。尽管在创建模型方面做出了巨大的努力,但使用拥抱面孔变压器库实施它还是相对容易的。我们可以生成摘要管道并将其应用于koude2步骤。为了获得更好的结果,我们还将额外的步骤纳入此地图过程,该过程涉及在总结文本之前清洁文本。

import re

# Let's create a summarization pipeline
sum_tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
sum_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
summarizer = pipeline("summarization", tokenizer=sum_tokenizer, model=sum_model)
tag_re = re.compile(r'(<!--.*?-->|<[^>]*>)')

def summarize(ticker__news):
    ticker, news = ticker__news
    article = news['content']
    article_no_tags = tag_re.sub('', article)
    article_no_tags = article_no_tags.replace("\r", "").replace("\n", "")
    summary = summarizer(article_no_tags, max_length=130, min_length=30, do_sample=False)
    news['bart_summary'] = summary[0]['summary_text']
    print(f"bart summary:{summary[0]['summary_text']}")
    return (ticker, news)

flow.map(summarize)

输出

通过分析新闻,我们可以设置一个捕获步骤来输出修改后的新闻对象,然后运行我们的数据流。对于这种情况,我们将将输出写入STDOUT,以便我们可以轻松查看它,但是在生产系统中,我们可以将结果写入下游Kafka主题或数据库以进行进一步分析。

如果您在笔记本中关注,请记住,必须对此进行身份验证才能工作,并且需要设置您的羊驼API密钥和Secret

from bytewax.execution import run_main
from bytewax.outputs import StdOutputConfig

flow.capture(StdOutputConfig())

if __name__ == ' __main__ ':
    run_main(flow)

包起来

虽然我们的示例简化了,但它展示了Bytewax和拥抱Face的语言模型的力量。我们可以轻松地实时分析财务新闻文章,确定重大事件并做出明智的决定:使用羊驼新闻API作为我们的数据源,我们能够构建一个数据流,该数据流可以删除故事并总结每篇文章的内容。 /p>

通过Python本地Bytewax和Hugging Face Transformers库的实施易于实施,使数据工程师和研究人员可以使用自己的项目中利用这些最先进的语言模型。我们希望这篇博客文章是希望在其财务决策过程中利用实时新闻分析的任何人的有用指南。