Mac 运行 chatglm2-6b

2023-06-30 tech mac ai llm 12 mins 21 图 4225 字

我是在 Mac studio M2 Max上运行的。这篇文章记录运行的过程。以下是我的一些版本信息：

MacOS 13.4
共享内存：96G
conda 23.5.0
Python 3.11.4
pip 23.1.2

231009更新：

我把本地的模型全部删除了，更新了一版代码，用默认的THUDM/chatglm2-6b，在cli和web2下是OK的，web前端页面输出不全，不知道什么情况。

比较好奇，它在哪里load的模型？我本地已经全部删除了全部模型，并且我也断网了。🤷

然后发现它的模型又由8个变成7个了，然后mps显卡又能用了。

测试过好几次，只要对话稍微多一点，内存直接要撑爆了。

真的有点伤心。

230818更新：

发现模型由7个变成了8个，老环境无法启动了，重新下载了新的模型,也把代码更新了。主要参考 Github 上这个文档里关于“Mac部署”和“本地加载”模型两个内容，修改使用mps即可：
model = AutoModel.from_pretrained("/Users/kelu/Documents/huggingface/chatglm-6b", trust_remote_code=True).half().to('mps')
但我没有办法正常对话：

使用web_demo2.py 也是同样的问题：
streamlit run web_demo2.py
迫不得已换成了cpu的版本，cpu能跑。
model = AutoModel.from_pretrained("/Users/kelu/Documents/huggingface/chatglm-6b", trust_remote_code=True).float()
担心pytorch版本问题可以用这串命令简单打印：
import torch
import transformers

print(f"PyTorch version: {torch.__version__}")
print(f"transformers version: {transformers.__version__}")

# Check PyTorch has access to MPS (Metal Performance Shader, Apple's GPU architecture)
print(f"Is MPS (Metal Performance Shader) built? {torch.backends.mps.is_built()}")
print(f"Is MPS available? {torch.backends.mps.is_available()}")

# Set the device      
device = "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")


# Create data and send it to the device
x = torch.rand(size=(3, 4)).to(device)
x
我的版本是 2.0.1。。当然也试过 nightly的版本：

使用 gpu 都是和上边一样的报错。cpu能跑就不管它了。

一、环境准备

如果你还不熟悉python使用，可以参考我之前关于conda相关的文章，切换到虚拟环境进行操作。
conda create -n chatglm2_env python=3.11
conda activate chatglm2_env
退出环境：
conda deactivate

在GitHub上下载源码。https://github.com/THUDM/ChatGLM2-6B

git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B

使用国内源(清华)安装依赖，否则速度很慢。(毕竟也是清华合作开源的项目)

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

二、下载模型

https://cloud.tsinghua.edu.cn/d/674208019e314311ab5c/

也可以去 huggingface.co上下载模型。
brew install git-lfs
安装。
pip install gradio -i https://pypi.tuna.tsinghua.edu.cn/simple

可以在代码里指定下载好的模型地址。我是运行 python web_demo.py后等待开始下载，然后直接替换缓存。

我的默认的下载路径是这个：

~/.cache/huggingface/hub/models--THUDM--chatglm2-6b/snapshots/c57e892806dfe383cd5caf09719628788fe96379

把下载好的文件直接替换这几个bin文件：

也可以修改代码中的代码，诸如：

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).cuda()

中THUDM/chatglm2-6b的替换，例如我的替换为：

"/Users/kelu/Documents/huggingface/chatglm-6b"

三、运行demo

vscode 选择解释器

cmd + p
> interpreter

1. webdemo

python web_demo.py

可以注意到有warning：

/modeling_chatglm.py:1173: UserWarning: MPS: no support for int64 min/max ops, casting it to int32 (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1682343668887/work/aten/src/ATen/native/mps/operations/ReduceOps.mm:1271.)
  if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):

运行demo2:

 pip install streamlit streamlit-chat -i https://pypi.tuna.tsinghua.edu.cn/simple

streamlit run web_demo2.py

2. 命令行demo

3. api

curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "你和chatgpt哪个更好？", "history": []}'

四、一些遇到的问题：

只要开了系统代理就会报这个错。

 requests.exceptions.SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm2-6b/resolve/main/tokenizer_config.json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1002)')))

查看了很多信息也无法解决。我把代理关掉之后又有：

     assert os.path.isfile(model_path), model_path
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "<frozen genericpath>", line 30, in isfile
 TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

这个报错是缺少模型文件。

所以不开代理就不会自动下载文件找不到模型，开了代理就下载不了模型。

最后的解决办法竟然是开全局代理。如果是开规则模式的话，则要把 huggingface.co 加入进去。

运行报错：

   File "/Users/kelu/Workspace/Miniforge3/envs/pytorch_env/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
     raise AssertionError("Torch not compiled with CUDA enabled")
 AssertionError: Torch not compiled with CUDA enabled

还是要仔细看官方文档介绍，Mac部署需要修改模型的载入方法：

 model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')