How to Run VLLM on Windows Docker: Simple Guide

Novita AI - Jul 30 - - Dev Community

Master the deployment of vLLM on Windows Docker for improved efficiency and performance. Get expert insights on our blog today.

Key Highlights

  • In the AI field, Large Language Models (LLMs) play a vital role in various applications, such as natural language processing and text generation.
  • Trusted platforms like vLLM offer LLMs as a service — under their generally well-regarded security and privacy policies.
  • VLLM is a powerful distributed inference library for handling large-scale models.
  • Docker provides an efficient way to containerize applications, making it easy to run vLLM on Windows.
  • With a guide simplifying the process of running VLLM on Windows Docker, new developers can master Docker and machine learning.

Introduction

In the era of data science and machine learning, LLMs are vast in size and complexity, demanding more meticulous attention to deploy effectively. vLLM, short for Virtual Large Language Models has become crucial for advanced NLP applications. Whether you’re a data scientist, developer, or researcher, running VLLMs efficiently can make a significant difference in your projects. This blog provides a step-by-step process for setting up and running VLLM on Windows using Docker. We’ll cover everything from prerequisites to troubleshooting tips to ensure a smooth setup.

Exploring VLLM and Docker

Basics of VLLM

Before diving into Docker specifics, let’s briefly cover what VLLM is. Virtual Large Language Models (vLLM) is a high-performance, open-source inference server for large language models equipped with PagedAttention. It is created for ease of use and high throughput with algorithms. vLLM is up to 24 times faster than similar solutions offered by other inference servers. They play a crucial role in numerous NLP tasks. Running these models efficiently necessitates strong computational resources and a properly configured environment, where Docker proves to be useful.

Advantages of VLLM

  • Easy integration with popular models
  • High throughput by serving more requests per second than traditional methods
  • Near-zero waste in cache memory, with faster query response times
  • OpenAI-compatible API server

Why Use Docker?

Docker is an open-source container service platform for developing, shipping, deploying, and running containerized applications. Docker simplifies the configuration and control of software environments through containerization. These containers bundle an application with its requirements, enabling it to operate uniformly on various computing setups. vLLM benefits by avoiding setup complications and version discrepancies, making model deployment and administration easier.

How to Run VLLM on Windows Docker

Here we will take Llama3.1 70B for example to show how to run VLLM on Windows Docker. Novita AI provides LLM API service for this model too. You can visit Model API to see our featured models.

Prerequisites for Running VLLM on Windows Docker

  • Windows 10 or later: Docker Desktop for Windows is compatible with these versions.
  • Docker Desktop: Install Docker Desktop from the official Docker website.

Step-by-Step Guide to Running VLLM on Windows Docker

Step 1: Install Docker Desktop

  • Download Docker Desktop: Visit the Docker website and download it for Windows.
  • Install Docker: Run the installer and follow the on-screen instructions. Enable virtualization if prompted.

Step 2: Configure Docker for Windows

  • Start Docker Desktop: Launch Docker Desktop from your Start menu. Keep it in the right directory.
  • Adjust Resources: Go to Docker Settings > Resources and allocate at least 4 CPUs and 8GB RAM for VLLM.
  • Clone the VLLM repository:
git clone https://github.com/vllm-project/vllm.git
cd vllm
Enter fullscreen mode Exit fullscreen mode

Step 3: Create Dockerfile for VLLM

  • Create Dockerfile: In the vLLM directory, create a Dockerfile to set up the environment for VLLM and LLaMA 3.1 70B.
# Use an official PyTorch image with CUDA support
FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
# Install dependencies
RUN apt-get update && \
apt-get install -y git python3-pip && \
pip install - upgrade pip
# Install VLLM dependencies
COPY . /vllm
WORKDIR /vllm
RUN pip install -r requirements.txt
# Install additional dependencies for LLaMA
RUN pip install transformers==4.19.0 # Adjust version as needed
# Set the entrypoint to start the server or run scripts
ENTRYPOINT ["python", "scripts/run_vllm.py"]
Enter fullscreen mode Exit fullscreen mode

Step 4: Build and Run Docker Container

  • In the terminal, run the following command to build the Docker image:
docker build -t vllm-llama3.1–70b .
Enter fullscreen mode Exit fullscreen mode

Step 5: Run the Docker Container

  • After building the image, run a container from it. Adjust the command and parameters based on your specific needs. Test by accessing the model’s endpoints and finally check the throughput. Then you can load and run Llama3.1 70B.
docker run - gpus all -it - rm - name vllm-llama3.1–70b-container vllm-llama3.1–70b
Enter fullscreen mode Exit fullscreen mode

Tips for Running VLLM on Windows Docker

  • Check Docker Settings: Ensure Docker Desktop is correctly installed and running. Verify that Docker is configured to use Linux containers.
  • Image and Dependencies: Ensure the vLLM Docker image is correctly downloaded. You can check with docker images. If there are issues with the image, try rebuilding it: docker build -t vllm.
  • Custom Models: Modify the Dockerfile and requirements.txt include additional libraries or custom VLLM models.
  • Volume Mounting: Use Docker volumes to persist data and manage large datasets efficiently.

Since it’s hard to do the vLLM deploying steps above, you can find the packed image on DockerHub and upload it to the Template of the Novita AI Instance. Then you can deploy vLLM simply.

Image description

Conclusion

Running vLLM on Windows using Docker offers a reliable environment for NLP model development and deployment. This guide helps set up a containerized environment for simplified dependency management and deployment, minimizing software conflicts and versioning issues. For support, check Docker’s official documentation and the vLLM community forums. Integrating Docker with vLLM streamlines your workflow and ensures efficient model performance across platforms.

FAQs

Does vLLM run locally?

VLLM will download the model automatically and store it in your HuggingFace cache directory. If you are running vLLM locally, there will be the default IP address and port.

Does vLLM require CUDA?

CUDA 11.8 or higher is required for GPUs with compute capability 9.0.

Can Docker run directly on Windows?

Docker containers allow you to run Windows programs and executables. The Docker platform is compatible with Windows (x86–64) operating systems.

How can I tell if the Docker daemon is running on Windows?

To check if the Docker daemon is running on Windows, look for the Docker Desktop icon in the system tray or run “docker info” in a PowerShell/Command Prompt window to display Docker environment information if the daemon is active.

Is Docker for Windows free?

Docker Desktop is free for small businesses (with fewer than 250 employees AND less than $10 million in annual revenue), personal use, education, and non-commercial open-source projects. For professional use beyond these categories, a paid subscription is necessary.

Originally published at Novita AI

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player