llama-cpp编译与安装

2023-09-13 AI llama 0 评论字数统计: 934(字) 阅读时长: 4(分)

准备工作

安装cmake, 下载地址 https://cmake.org/
安装python, 版本>=3.9即可
安装git
下载代码 git clone https://ghproxy.com/https://github.com/kill8g/llama.cpp.git

windows 环境

安装 visual studio, 版本>=2017即可
安装 CUDA, 可选, 不需要GPU加速可跳过这一步, 我安装的是11.8版本, 下载地址 https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_522.06_windows.exe

windows 环境下编译

md build
cd build
不使用GPU加速 cmake .. 使用GPU加速 cmake .. -DLLAMA_CUBLAS=ON
cmake --build . --config Release

linux 环境下编译

mkdir build && cd build
查看CPU avx信息, cat /proc/cpuinfo | grep avx

根据你的CPU信息来设定参数
支持的参数有 LLAMA_AVX LLAMA_AVX2 LLAMA_AVX512 LLAMA_AVX512_VBMI LLAMA_AVX512_VNNI
如果linux环境有N卡, 那么也可以安装CUDA后使用GPU加速
cmake .. -DLLAMA_***=ON
cmake --build . --config Release

windows OpenBLAS加速可选

安装pkgconfig

rem 管理员权限powershell下执行, 安装choco
Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
rem 安装pkgconfiglite, 自动安装至C:\ProgramData\chocolatey\lib\pkgconfiglite\tools\pkg-config-lite-0.28-1\bin
choco install pkgconfiglite

openblas下载
wget https://github.com/xianyi/OpenBLAS/releases/download/v0.3.24/OpenBLAS-0.3.24-x64.zip

修改openblas.pc, 文件位于lib目录中

# lib路径
libdir=D:/project/OpenBLAS/lib
libsuffix=
# include 路径
includedir=D:/project/OpenBLAS/include/openblas
openblas_config=USE_64BITINT= NO_CBLAS= NO_LAPACK= NO_LAPACKE= DYNAMIC_ARCH=OFF DYNAMIC_OLDER=OFF NO_AFFINITY=1 USE_OPENMP= generic MAX_THREADS=12 
Name: OpenBLAS
Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version
Version: 
URL: https://github.com/xianyi/OpenBLAS
Libs:  -L${libdir} -lopenblas${libsuffix} 
Cflags: -I${includedir}

复制openblas.pc到pkgconfiglite目录下
添加命名为PKG_CONFIG的环境变量C:\ProgramData\chocolatey\lib\pkgconfiglite\tools\pkg-config-lite-0.28-1\bin
使用cmake编译llama.cpp时, 添加参数 -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS
OpenBLAS 和cuBLAS 无法混合使用, 尝试了OpenBLAS + cuBLAS, 回答内容全是乱码

模型下载

huggingface目前国内已经无法访问, 如果不会魔法, 下载模型只能通过第三方工具

第一种方案, 使用git下载 git clone https://huggingface.co/FlagAlpha/Llama2-Chinese-7b-Chat
第二种方案, 使用模型下载工具 git clone https://github.com/git-cloner/aliendao.git
pip install -r requirements.txt
python model_download.py --repo_id FlagAlpha/Llama2-Chinese-7b-Chat --mirror

模型转换

cd llama.cpp
pip install -r requirements.txt
将下载的模型放入到models/7B/文件夹内, 如果是13B模型, 就放到models/13B/
python convert.py models/7B/
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_k.bin q4_k, 模型转换方式有多种, 推荐q4_k

启动

# windows
build\bin\Release\server.exe -m models\7B\ggml-model-q4_k.gguf ^
	-c 2048 ^
	-b 4096 ^
	-t 8 ^
	--host 127.0.0.1 ^
	--port 8090

# linux
./build/bin/server -m models\7B\ggml-model-q4_k.gguf -c 2048 -b 4096 -t 8 --host 127.0.0.1 --port 8090

-t 参数代表线程, 根据你的实际情况决定
跟多参数请 server -h
直接命令行环境使用 main -m models\7B\ggml-model-q4_k.gguf -ins

常见报错

No CUDA toolset found
一般是安装CUDA时没有勾选Visual Studio Integration
将C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\extras\visual_studio_integration\MSBuildExtensions里的文件复制粘贴到你的 Visual Studio BuildCustomizations文件夹, 以VS2022为例, BuildCustomizations的路径是MSBuild\Microsoft\VC\v170\BuildCustomizations, 其他版本的Visual Studio请自行上网搜索No CUDA toolset found即可

性能检查

检查是否有正确使用到cuBLAS(如果有开启cuBLAS加速)
main.exe -ngl 200000 -p "Please sir, may I have some" -m 模型路径
如果有输出类似与下面的内容说明cuBLAS功能正常

1
2
3

llama_model_load_internal: [cublas] offloading 60 layers to GPU
llama_model_load_internal: [cublas] offloading output layer to GPU
llama_model_load_internal: [cublas] total VRAM used: 17223 MB

检查线程数量是否合理, 线程数量并非越多越好, 线程数太多可能反而降低token生成速度
将以下命令执行数次, 每次选择不同的线程数, 查看输出log中prompt eval time时间变化
main.exe -ngl 200000 -p "Please sir, may I have some" -m 模型路径 -t 线程数

本文链接： https://www.kill8g.com/2023/09/13/llama-cpp/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

山中

Developer

llama-cpp编译与安装

准备工作

windows 环境

windows 环境下编译

linux 环境下编译

windows OpenBLAS加速可选

模型下载

模型转换

启动

常见报错

性能检查

山中Developer

llama-cpp编译与安装

准备工作

windows 环境

windows 环境下编译

linux 环境下编译

windows OpenBLAS加速 可选

模型下载

模型转换

启动

常见报错

性能检查

山中Developer

windows OpenBLAS加速可选