macOS Xinference 安装记录

2025-03-13 tech llm xinference 4 mins 12 图 1422 字

Xorbits inference 是一个强大且通用的分布式推理框架，可用于大语言模型（LLM），语音识别模型，多模态模型等各种模型的推理。可以轻松地一键部署自己的模型或内置的前沿开源模型。我主要是为了在 dify 上使用 Rerank，然后运行的Xinference。

一、安装

官方手册：https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html

conda create -n xinference python=3.11
conda activate xinference
pip install "xinference[all]"
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

image-20250313午後32636314

遇到了一些错误，也觉得正常：

/private/var/folders/sl/j0g8fv0d5h97tc_xxsy3fkyr0000gn/T/pip-install-6lcea4jt/llama-cpp-python_47e4373c3e314d09bee185d9fcb17bda/vendor/llama.cpp/ggml/src/ggml-quants.c
ninja: build stopped: subcommand failed.

*** CMake build failed
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

image-20250313午後32211672

安装 ninja即可:

brew install ninja

之后运行：

xinference-local # 本地
xinference-local -H 0.0.0.0 # 建议用这个，因为要和dify配合

image-20250313午後33149271

image-20250313午後34651133

image-20250313午後33234731

可以改语言成中文，顺眼一点：

image-20250313午後40911619

二、加载模型

查看内置模型：https://inference.readthedocs.io/zh-cn/latest/models/builtin/index.html

xinference launch --model-name bge-reranker-large --model-type rerank # 加载模型

也可以ui界面操作：

image-20250313午後40457075

xinference 默认的是从 huggingface 下载大模型，网络原因根本下载不下来，需要更换为国内的源，重新启动：

XINFERENCE_MODEL_SRC=modelscope xinference-local --host 0.0.0.0

看日志终于开始下载了：

image-20250313午後41453455

下载完成：

image-20250313午後45657070

三、dify接入

添加供应商：

image-20250313午後45918906

image-20250313午後45958268

修改知识库检索设置：

image-20250313午後50127250

接入成功🏅

四、其他

xinference terminate --model-uid ${model_uid}   # 结束模型