Artificial Intelligence Series (Chapter 0): Local Deployment of Large Models

The Ice-breaking Topic

Why should we try to deploy the AI Models in my local environment?

Data Privacy and Security

Protecting Sensitive Data: When dealing with sensitive data (such as healthcare, finance, or personal privacy), deploying AI models locally can prevent data from being uploaded to the cloud, thereby reducing the risk of data breaches.
Compliance Requirements: Some industries have stringent regulations regarding data storage and processing, such as the GDPR in Europe, which encourages businesses to deploy AI models locally.

Latency and Performance

Reducing Network Latency: Deploying models locally can reduce the latency associated with data transmission, particularly for applications that require real-time or low-latency processing, such as autonomous driving or video surveillance.
High-Performance Computing: Local deployment allows full utilisation of local hardware resources, providing higher computational performance. The cloud may face limitations in bandwidth and computational resources, while local hardware can be optimised and tailored to the specific needs of the task.

Cost Control

Avoiding Cloud Service Costs: While cloud computing is flexible, it can become costly over the long term. Especially in scenarios with high computational and storage needs, local deployment can help businesses reduce their reliance on cloud resources and lower costs.
On-Demand Hardware Expansion: Businesses can purchase appropriate hardware resources as needed, avoiding the fixed costs of cloud service providers.

Control and Customisation

Complete Control: Deploying AI models locally ensures full control over both hardware and software, allowing businesses to customise and optimise the deployment environment to meet specific needs.
Flexibility: Companies have the freedom to choose the computing platform (e.g., GPU, TPU, CPU) and system architecture, enabling finer-grained optimisation.

Offline Operation and High Availability

No Reliance on the Internet: For certain application scenarios (such as remote areas or offshore platforms), local deployment ensures that the system continues to function without an internet connection.
High Availability: Local deployment reduces dependence on external cloud service providers, helping businesses avoid disruptions caused by network issues or cloud service outages.

Environment Adaptation

Optimising Hardware Resources: AI models deployed locally can be optimised according to the specific hardware environment (e.g., dedicated GPUs or custom hardware accelerators).
Reducing Shared Cloud Resources: Cloud platforms often share resources among multiple customers, which can lead to performance fluctuations. Local deployment can avoid this resource contention.

Deploying Large Models in local

Here, I would like to introduce how to deploy ollama by using docker.

In fact, there are many methods to deploy.

Eg. Install in pysical machine by using curl command in Linux, Download compressions and unzip the packages to install in Windows/MacOS operating system. What’s more, you could install ollama by pip as well.

More detailed information, please visit the ollama GitHub.

Deploying the Ollama

Deploying in docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

1 2	# CPU Only docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

About the docker knowledge, I have already introduced before. You could review here.

After pulling the image from DockerHub, you could use the command ollama -v to check the version of the current ollama.

1 2	# Check the version of ollama ollama -v

Deploying in the physical machine

Different operating system has different methods to deploy ollama in physical.

Here, in this case we will use curl command for local deployment in Linux OS.

1 2	# Download the deploy script from the official website and execute the installation shell curl -fsSL https://ollama.com/install.sh \| sh

After completing the download process, you could check the status of ollama service.

1 2	# Use the systemctl command to check the service status sudo systemctl status ollama

Similarly, you could use the command ollama -v to check the version of ollama you have installed.

1 2	# Check the version of ollama ollama -v

For even more information about downloading ollama, please refer to the official website.

Pulling the models from ollama library

There are plenty of models you could pull from the ollama library.

Please refer to your own server’s performance and configuration to choose the model.

Eg. 12 cores CPU + 32 GB Memory => deepseek-r1:8b is recommended.

# Pull the model from the ollama library
ollama pull deepseek-r1:8b

# Pull and run the model in local
ollama run deepseek-r1:8b

If you pulled the model successfully, you could use the command ollama list(ls) to confirm the result.

1	ollama list/ls

Interact with the local models

There are two ways to interact with local models.

Use the ollama run command in CLI.

1	ollama run [the model name:tag]

It will guide you in the CLI. You could use ctrl+d or /bye to quit the interaction in CLI.

1
2
3

# Run the LLM in local as you specifiy
ollama run deepseek-r1:8b
>>> Send a message (/? for help)

Visit and question in front-end website open-webui.

Deploying the open-webui in docker.

# If Ollama is on your computer, use this command
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

# If Ollama is on a Different Server, use this command
docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Deploying the open-webui by pip.

# Install
pip install open-webui

# Execute
open-webui serve

By the way, Deploying in docker is recommended.

The final step for you here is to login the open-webui website and let’s question!

You could use http://ip:port or http://localhost:port to visit your local UI.

Set the administrator (name + email + password) and then you could login.

Selecting the various types of models you’ve already downloaded in local.

You could also select more than one model and compare the results at the same time.

The demo representation is shown as below.

Local open-webui

Congratulations to you! Now you can have a good time experimenting with large models locally ：）