代码收藏家技术教程 2025-01-20

探索 Llama.cpp 与 Llama-cpp-python：轻松运行大型语言模型

引言

在AI和编程领域，语言模型（LLM）的应用正变得越来越普遍。Llama.cpp及其Python绑定Llama-cpp-python提供了一种便捷的方法来使用大型语言模型进行推理。本文旨在指导您如何在LangChain中运行Llama-cpp-python，并探讨可能的挑战和解决方案。

主要内容

什么是Llama.cpp和Llama-cpp-python？

Llama.cpp是一个支持多种LLM模型的C++库，而Llama-cpp-python是其Python绑定。通过Llama-cpp-python，开发者可以轻松在Python环境中运行这些模型，特别是在Hugging Face等平台上可用的模型。

安装指南

根据您的硬件配置，有多种安装Llama-cpp-python的方法：

CPU 版本：适用于所有常规CPU用户。

%pip install --upgrade --quiet llama-cpp-python

GPU 优化 (cuBLAS)：对于NVIDIA GPU用户，从源代码重新安装库，并使用cuBLAS加速。

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

Metal 支持（MacOS）：对于使用Apple Silicon芯片的Mac用户。

!CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python

Windows 用户：可以通过从源代码编译Llama-cpp-python来安装。需要安装Git、Python、CMake和Visual Studio。

使用Llama-cpp-python在LangChain中运行模型

在安装完成后，您可以在LangChain中使用Llama-cpp-python运行模型。以下是一些基本代码示例。

代码示例

以下是如何使用Llama-cpp-python加载和运行Llama 2 7B模型的代码示例：

from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

# 使用API代理服务提高访问稳定性
llm = LlamaCpp(
    model_path="/path/to/your/model/openorca-platypus2-13b.gguf.q4_0.bin",
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True
)

question = "What is the capital of France?"
response = llm.invoke(question)
print(response)