LLM Xinference 安装使用（支持CPU、Metal、CUDA推理和分布式部署）-平芜编程栈

1. 详细步骤

1.1 安装

# CUDA/CPU pip install "xinference[transformers]" pip install "xinference[vllm]" pip install "xinference[sglang]" # Metal(MPS) pip install "xinference[mlx]" CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python 注：可能是 nvcc 版本等个人环境配置原因，llama-cpp-python 在 CUDA 上无法使用（C/C++ 环境上是正常的），Metal 的 llama-cpp-python 正常。如需安装 flashinfer 等依赖见官方安装文档：https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html

1.2 启动

1.2.1 直接启动

简洁命令

xinference-local --host 0.0.0.0 --port 9997

多参数命令

设置模型缓存路径和模型来源（Hugging Face/Modelscope）

# CUDA/CPU XINFERENCE_HOME=/path/.xinference XINFERENCE_MODEL_SRC=modelscope xinference-local --host 0.0.0.0 --port 9997 # Metal(MPS) XINFERENCE_HOME=/path/.xinference XINFERENCE_MODEL_SRC=modelscope PYTORCH_ENABLE_MPS_FALLBACK=1 xinference-local --host 0.0.0.0 --port 9997

1.2.2 集群部署

通过ifconfig查看当前服务器IP

1.2.2.1 主服务器启动 Supervisor

# 格式 xinference-supervisor -H 当前服务器IP(主服务器IP) --port 9997 # 示例 xinference-supervisor -H 192.168.31.100 --port 9997

1.2.2.2 其他服务器启动 Worker

# 格式 xinference-worker -e "http://${主服务器IP}:9997" -H 当前服务器IP(子服务器IP) # 示例 xinference-worker -e "http://192.168.31.100:9997" -H 192.168.31.101

注：按需添加XINFERENCE_HOME、XINFERENCE_MODEL_SRC、PYTORCH_ENABLE_MPS_FALLBACK等环境变量（启动时参数）

1.3 使用

访问http://主服务器IP:9997/docs查看接口文档，访问http://主服务器IP:9997正常使用

2. 参考资料

2.1 Xinference

2.1.1 部署文档

本地运行 Xinference

https://inference.readthedocs.io/zh-cn/latest/getting_started/using_xinference.html#run-xinference-locally

集群中部署 Xinference

https://inference.readthedocs.io/zh-cn/latest/getting_started/using_xinference.html#deploy-xinference-in-a-cluster

2.1.2 安装文档

官方页面

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html

Transformers 引擎

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html#transformers-backend

vLLM 引擎

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html#vllm-backend

Llama.cpp 引擎

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html#llama-cpp-backend

MLX 引擎

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html#mlx-backend

3. 资源

3.1 Xinference

3.1.1 GitHub

官方页面

https://github.com/xorbitsai/inference

https://github.com/xorbitsai/inference/blob/main/README_zh_CN.md

3.1.2 安装文档

SGLang 引擎

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html#sglang-backend

其他平台（在昇腾 NPU 上安装）

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation.html#other-platforms

https://inference.readthedocs.io/zh-cn/latest/getting_started/installation_npu.html#installation-npu

MySQL 创建新用户及授予权限的完整流程

1. 连接到MySQL数据库首先，以管理员身份连接到MySQL数据库，通常是root用户： mysql -u root -p系统会提示输入管理员用户的密码，输入密码后进入MySQL的命令行。 2. 创建新用户使用CREATE USER命令来创建一个新的MySQL用户。这个命…

李华

Linux部署Redis集群

Redis简介 Redis (REmote DIctionary Server) 是一个高性能的 key-value 数据库，完全开源，遵守 BSD 协议。 Redis 与其他 key - value 缓存产品相比有以下三个特点： Redis支持数据持久化，可以将内存中的数据保存在磁盘中&#…

李华

vulnhub靶场实战系列-1.靶场实战平台介绍|课程笔记|网络安全|

1-1-为什么需要靶场平台_笔记一、网络安全大师课00:05 1. 课程介绍00:09 课程目标：帮助学员建立对网络安全行业的整体认知，了解网络安全对国家和社会的作用，避免"一叶障目，不见森林"的情况。课…

李华

Linux安装Redis以及Redis三种启动方式

目录树一、安装前的软件准备二、Redis的安装三、Redis的三种启动方式！！！ 1、直接启动Redis2.后台进程方式启动Redis3.通过开机启动方式四、Window上桌面连接Linux上的Redis 一、安装前的软件准备 Xshell —— 连接Linux并操作其终端的软…

李华

导师推荐！MBA毕业论文必备！10款AI论文写作软件TOP10测评

导师推荐！MBA毕业论文必备！10款AI论文写作软件TOP10测评 2026年MBA论文写作工具测评：为何需要一份专业榜单？ 随着人工智能技术的不断发展，AI写作工具已成为MBA学生撰写毕业论文的重要辅助工具。然而，面对市…

李华

AI 开源知识库大战：WeKnora、RAGFlow、FastGPT、FlashRAG，谁更厉害

我看大家对目前的开源RAG知识库都挺感兴趣的，就像来对比一下目前比较流行的几个知识库，看看哪个更适合你，哪个更有钱途，哈哈。其实真要搭过这几个知识库，就会发现：每个用到的地方，还真不一样&a…

李华