docker+gpu部署模型

Chris Harris

操作环境

1
2
3
4
5
6
7
8
9
10
11
$ screenfetch
chris@princess
OS: Ubuntu 24.04 noble
Kernel: x86_64 Linux 6.8.0-84-generic
Uptime: 1h 8m
Packages: 983
Shell: zsh 5.9
Disk: 74G / 3.6T (3%)
CPU: Intel Core i7-14700K @ 28x 5.5GHz [34.0°C]
GPU: NVIDIA GeForce RTX 4090
RAM: 12513MiB / 64035MiB

安装NVIDIA Container Toolkit

Installing the NVIDIA Container Toolkit

下面是安装详细步骤

安装

  1. Configure the production repository:

    1
    2
    3
    4
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

    Optionally, configure the repository to use experimental packages:

    1
    sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
  2. Update the packages list from the repository:

    1
    sudo apt-get update
  3. Install the NVIDIA Container Toolkit packages:

    1
    2
    3
    4
    5
    6
    export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
    sudo apt-get install -y \
    nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

配置

  1. Configure the container runtime by using the nvidia-ctk command:

    1
    sudo nvidia-ctk runtime configure --runtime=docker

    The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

  2. Restart the Docker daemon:

    1
    sudo systemctl restart docker

Rootless mode

​ To configure the container runtime for Docker running in Rootless mode , follow these steps:

​1. Configure the container runtime by using the nvidia-ctk command:

1
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
2. Restart the Rootless Docker daemon:
1
systemctl --user restart docker
3. Configure `/etc/nvidia-container-runtime/config.toml` by using the `sudo nvidia-ctk` command:
1
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

配置apt代理(optional)

由于使用了github.io镜像,提高速度配置apt代理

之前在服务器上面安装了trojan客户端

  • 启动trojan
1
2
3
$ cat ~/trojan.sh 
/opt/trojan/trojan -c /opt/trojan/config.json -l /var/log/trojan/trojan.log 2>&1 &
$ ~/trojan.sh
  • 配置apt代理

socks5h代表使用代理服务器的域名解析

1
2
3
4
$ sudo tee /etc/apt/apt.conf.d/proxy.conf <<EOF
Acquire::http::Proxy "socks5h://127.0.0.1:1080"\;
Acquire::https::Proxy "socks5h://127.0.0.1:1080"\;
EOF

不用了过后可以sudo mv /etc/apt/apt.conf.d/proxy.conf /etc/apt/apt.conf.d/proxy.conf.bak

model_api.py、Dockerfile

Model_api.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

MODEL_CACHE = os.environ.get("HF_HOME", "/opt/hf_cache")
os.environ["HF_HOME"] = MODEL_CACHE
# 这个在新版本中已经弃用了
# os.environ["TRANSFORMERS_CACHE"] = MODEL_CACHE

import uvicorn
import asyncio
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()

# 加载一个预训练模型
classifier = pipeline(
"sentiment-analysis",
# 这个贼坑,官方id是distilbertbase/distilbert-base-uncased-finetuned-sst-2-english,
# 但是hf-mirror.com无法获取到config.json
"distilbert-base-uncased-finetuned-sst-2-english",
device=0,
token="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"
)


# 请求体模型
class TextInput(BaseModel):
text: str


# 封装 API
@app.post("/predict-sentiment/")
async def predict_sentiment(input: TextInput):
# result = classifier(input.text)
# 该接口是异步的,所以放到线程池中避免造成event loop阻塞
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, classifier, input.text)
return {"result": result}


# 启动应用
if __name__ == '__main__':
uvicorn.run("model_api:app", host="0.0.0.0", port=8000, reload=True)

Dockerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FROM python:3.12
LABEL authors="chris"

WORKDIR /app
ENV HF_HOME=/app/models
ENV HF_ENDPOINT=https://hf-mirror.com

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

RUN python -c "from transformers import AutoModelForSequenceClassification, AutoTokenizer; \
AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english', cache_dir='/app/models'); \
AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english', cache_dir='/app/models')"


COPY ./model_api.py .

CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "model_api:app", "--bind", "0.0.0.0:8000", "--workers", "4", "--timeout", "60"]

启动镜像

镜像打包

–progress=plain参数可以显示详细过程

1
docker build -t model-api:1.0.0 . --progress=plain

容器运行

1
docker run --gpus all -d -p 80:8000  --name model-api model-api:1.0.0

查看容器状态

1
2
3
4
5
6
7
$ docker exec -it model-api bash                    
root@b65bfeae5313:/app# python
Python 3.12.11 (main, Sep 8 2025, 22:53:21) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

显示True说明成功了,当然也可以在容器中使用nvidia-smi来查看gpu状态


参考文献:Installing the NVIDIA Container Toolkit

  • 标题: docker+gpu部署模型
  • 作者: Chris Harris
  • 创建于: 2025-08-29 00:59:57
  • 更新于: 2025-09-29 01:00:33
  • 链接: https://s4g.top/2025/08/29/docker-gpu部署模型/
  • 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。