docker+gpu部署模型

操作环境

$ screenfetch
chris@princess
OS: Ubuntu 24.04 noble
Kernel: x86_64 Linux 6.8.0-84-generic
Uptime: 1h 8m
Packages: 983
Shell: zsh 5.9
Disk: 74G / 3.6T (3%)
CPU: Intel Core i7-14700K @ 28x 5.5GHz [34.0°C]
GPU: NVIDIA GeForce RTX 4090
RAM: 12513MiB / 64035MiB

安装NVIDIA Container Toolkit

Installing the NVIDIA Container Toolkit

下面是安装详细步骤

安装

Configure the production repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Optionally, configure the repository to use experimental packages:

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the packages list from the repository:
1
sudo apt-get update

Install the NVIDIA Container Toolkit packages:

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
  sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

配置

Configure the container runtime by using the nvidia-ctk command:
1
sudo nvidia-ctk runtime configure --runtime=docker
The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.
Restart the Docker daemon:
1
sudo systemctl restart docker

Rootless mode

To configure the container runtime for Docker running in Rootless mode , follow these steps:

1. Configure the container runtime by using the nvidia-ctk command:

nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json

2. Restart the Rootless Docker daemon:

systemctl --user restart docker

3. Configure `/etc/nvidia-container-runtime/config.toml` by using the `sudo nvidia-ctk` command:

sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

配置apt代理（optional）

由于使用了github.io镜像，提高速度配置apt代理

之前在服务器上面安装了trojan客户端

启动trojan

$ cat ~/trojan.sh 
/opt/trojan/trojan -c /opt/trojan/config.json -l /var/log/trojan/trojan.log 2>&1 &
$ ~/trojan.sh

配置apt代理

socks5h代表使用代理服务器的域名解析

$ sudo tee /etc/apt/apt.conf.d/proxy.conf <<EOF
Acquire::http::Proxy "socks5h://127.0.0.1:1080"\;
Acquire::https::Proxy "socks5h://127.0.0.1:1080"\;
EOF

不用了过后可以sudo mv /etc/apt/apt.conf.d/proxy.conf /etc/apt/apt.conf.d/proxy.conf.bak

model_api.py、Dockerfile

Model_api.py

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

MODEL_CACHE = os.environ.get("HF_HOME", "/opt/hf_cache")
os.environ["HF_HOME"] = MODEL_CACHE
# 这个在新版本中已经弃用了
# os.environ["TRANSFORMERS_CACHE"] = MODEL_CACHE

import uvicorn
import asyncio
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()

# 加载一个预训练模型
classifier = pipeline(
    "sentiment-analysis",
  # 这个贼坑，官方id是distilbertbase/distilbert-base-uncased-finetuned-sst-2-english，
  # 但是hf-mirror.com无法获取到config.json
    "distilbert-base-uncased-finetuned-sst-2-english",
    device=0,
    token="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"
)


# 请求体模型
class TextInput(BaseModel):
    text: str


# 封装 API
@app.post("/predict-sentiment/")
async def predict_sentiment(input: TextInput):
    # result = classifier(input.text)
    # 该接口是异步的，所以放到线程池中避免造成event loop阻塞
    loop = asyncio.get_running_loop()
    result = await loop.run_in_executor(None, classifier, input.text)
    return {"result": result}


# 启动应用
if __name__ == '__main__':
    uvicorn.run("model_api:app", host="0.0.0.0", port=8000, reload=True)

Dockerfile

FROM python:3.12
LABEL authors="chris"

WORKDIR /app
ENV HF_HOME=/app/models
ENV HF_ENDPOINT=https://hf-mirror.com

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

RUN python -c "from transformers import AutoModelForSequenceClassification, AutoTokenizer; \
    AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english', cache_dir='/app/models'); \
    AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english', cache_dir='/app/models')"


COPY ./model_api.py .

CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "model_api:app", "--bind", "0.0.0.0:8000", "--workers", "4", "--timeout", "60"]

启动镜像

镜像打包

–progress=plain参数可以显示详细过程

docker build -t model-api:1.0.0 . --progress=plain

容器运行

docker run --gpus all -d -p 80:8000  --name model-api model-api:1.0.0

查看容器状态

$ docker exec -it model-api bash                    
root@b65bfeae5313:/app# python
Python 3.12.11 (main, Sep  8 2025, 22:53:21) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

显示True说明成功了，当然也可以在容器中使用nvidia-smi来查看gpu状态

参考文献：Installing the NVIDIA Container Toolkit