与美洲驼的第一步-DEV365 开发者社区

是时候尝试使用Generative AI和LLM（大语言模型）进行一些编码。

我对此有一个要求，那就是在我的计算机上本地运行它，而不是使用Web API，例如OpenAI提供的一个。

为此，我设法偶然发现了Georgi Gerganov的llama.cpp，它也对其他编程语言具有几个绑定。

我和llama-cpp-python的Python一起去了，因为我的目标只是在本地进行一个小项目。

设置项目。

# Create a folder for the project
mkdir test-llama-cpp-python && cd $_

# Create a virtual environment
pyenv virtualenv 3.9.0 llm
pyenv activate llm

# Install the llama-cpp-python package
python -m pip install llama-cpp-python
python -m pip freeze > requirements.txt

# Create an empty main.py
touch main.py

# Open up the project in VS Code
code .

在主文件中，添加一个简单的骨架提示循环。

import os

def get_reply(prompt):
    """Local inference with llama-cpp-python"""
    return ""

def clear():
    """Clears the terminal screen."""
    os.system('cls' if os.name == 'nt' else 'clear')

def main():
    """The prompt loop."""
    clear()

    while True:
        cli_prompt = input("You: ")

        if cli_prompt == "exit":
            break
        else:
            answer = get_reply(cli_prompt)

            print(f"""Llama: {answer}""")


if __name__ == '__main__':
    main()

从github上的examples上，我们可以看到我们需要将类导入 llame 中，我们还需要一个模型。

有一个受欢迎的AI社区Hugging Face，我们可以在其中找到可以使用的模型。模型文件有一个要求，也就是说，它必须以GGML文件格式为单位。 Llama.cpp github项目中有一个converter可以做到这一点。

但是，我搜索了已经处于这种格式的模型，并以我发现的第一种模型TheBloke/Llama-2-7B-Chat-GGML。我在末尾下载了以下模型，llama-2-7b-chat.ggmlv3.q4_1.bin。

选择正确/最佳模型是另一个主题，并且不在此帖子的范围内。

将模型下载到项目文件夹时，我们可以更新我们的main.py文件以开始使用Llama类和模型。

from llama_cpp import Llama

llama = Llama(model_path='llama-2-7b-chat.ggmlv3.q4_1.bin', verbose=False)

def get_reply(prompt):
    """Local inference with llama-cpp-python"""
    response = llama(
        f"""Q: {prompt} A:""", max_tokens=64, stop=["Q:", "\n"], echo=False
    )

    return response["choices"].pop()["text"].strip()

我们首先导入Llama类并初始化骆驼对象。构造函数需要使用 model_path 给出的模型文件的路径。我还将 verbose 标志设置为false，以抑制Llama-CPP软件包的噪声消息。

get_reply 方法正在使用Llama-CPP-Python软件包进行所有局部推断。提示从需要以特定方式格式化文本的提示，因此添加了 q 和 a 的提示。

这是代码的最终结果。

import os

from llama_cpp import Llama

llama = Llama(model_path="llama-2-7b-chat.ggmlv3.q4_1.bin", verbose=False)


def get_reply(prompt):
    """Local inference with llama-cpp-python"""
    response = llama(
        f"""Q: {prompt} A:""", max_tokens=64, stop=["Q:", "\n"], echo=False
    )

    return response["choices"].pop()["text"].strip()


def clear():
    """Clears the terminal screen."""
    os.system("cls" if os.name == "nt" else "clear")


def main():
    """The prompt loop."""
    clear()

    while True:
        cli_prompt = input("You: ")

        if cli_prompt == "exit":
            break
        else:
            answer = get_reply(cli_prompt)

            print(f"""Llama: {answer}""")


if __name__ == "__main__":
    main()

测试通过在CLI中执行以下操作。

python main.py

问一个问题，雷伯退出将关闭提示。

You: What are the names of the planets in the solar system?
Llama: The planets in our solar system, in order from closest to farthest from the Sun, are: Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune
You: exit

直到下次！