How do I Host an Open Source AI Chatbot on Linode?

nginx docker how-to open-source ai chatbot llama

Linode 7 months ago Linode Staff

I would like to host an open source AI chatbot in a compute instance on Linode. I know Meta's Llama 2 is available for download but can I put in a Linode?

1 Reply

tlambert 7 months ago Linode Staff

Yes! You are able to host Llama 2 in a compute instance. In this guide I'll give instructions for hosting not only the model but a slick front-end called Ollama Web UI in a Docker container behind an Nginx reverse proxy.

What You'll Need

While I'm sure you've heard that LLMs and AI in general use GPUs to function, the great thing about hosting it yourself is that you don't need to use a GPU. These are the requirements for this guide:

A Dedicated 16GB Plan that has been setup and secured (Debian/Ubuntu)
Custom Domain (optional)

While Llama 2 would likely run faster on an instance with more computing power, I've been able to run the model with 13 billion parameters on a Dedicated 16GB plan with Debian 11. You can deploy a Docker app from the Linode Marketplace to save yourself a step but installing Docker manually isn't too difficult if you're doing this on an instance you already have running. You'll also want to get a domain name hosted with us and create an A record for your IP if you haven't already.

Install Ollama

To get started with the Llama 2 LLM, we will install Ollama on the instance - a piece of software that allows you to run the LLM locally or in the cloud. The GitHub repo for Ollama has information on the capabilities of the service.

curl https://ollama.ai/install.sh | sh

After the install, you'll see a message saying Ollama will be running in CPU-only mode which, as I mentioned earlier, has an effect on overall performance but not on capabilities.

>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA GPU detected. Ollama will run in CPU-only mode.

You can get started by downloading the Llama 2 model with this command:

ollama run llama 2

This will allow you to run the Llama 2 LLM chatbot right from your terminal:

>>> Hello Llama! How are you?

Hello there, human! *giggles* I'm feeling quite fluffy and content today. It's a beautiful day in the Andes, and I'm enjoying the sunshine and fresh mountain air. *nuzzles nose* How about you? What brings you to this neck of the woods?

>>> Send a message (/? for help)

…ok then!

If this suits your needs then you can call it good here. You can pull other models or create "Modelfiles" that allow you to customize the model with a prompt. You can check out some examples here on ollamahub.com.

Install Web UI

In order to make the chatbot a bit more user friendly, let's make it accessible on the broader internet. We'll do this by deploying the Web UI docker container.

Install Docker

Since the Ollama Web UI is containerized, you will need to install Docker. You can follow the instructions for installing Docker on Debian here if you don't have it installed already.

Once we have installed, we can deploy the Web UI with the following command:

sudo docker run -d --network=host -v ollama-webui:/app/backend/data -e OLLAMA_API_BASE_URL=http://127.0.0.1:11434/api --name ollama-webui --restart always ghcr.io/ollama-webui/ollama-webui:main

This makes the container accessible on <your.ip.address>:8080. When you open that page in your browser you'll see the login page for the Ollama Web UI. Once you create your login username and password you'll be good to go!

Note

If you're distro has a default firewall (such as UFW) or you've enabled a firewall, you will need to allow connections on port 8080 in order to connect to the Web UI docker container.

Configure the Reverse Proxy

Now, we'll install Nginx so we can connect to Ollama securely using a reverse proxy. First, install Nginx:

sudo apt install nginx -y

Next, create a configuration file for your reverse proxy using your preferred text editor. I like vim:

sudo vim /etc/nginx/conf.d/llama.conf

The file should look like this:

server {
  listen 80;
  listen [::]:80;

  server_name example.com;

  location / {
      proxy_pass http://localhost:8080/;
  }
}

You'll want to change example.com to your custom domain name. Make sure you have allowed connections to the HTTP port (80) so connections can be made to your web server. To ensure Nginx is configured properly, you can test it with this command:

sudo nginx -t

Assuming no errors are returned, you can then reload the service with the following command and connect to the Ollama Web UI using your custom domain name:

sudo nginx -s reload

Enable SSL

Finally, we will secure the connection to your chatbot by enabling HTTPS connections to your web server. To do this we will use a Let's Encrypt Certificate administered by Certbot. You can use this guide to install Certbot on your compute instance and enable the certificates necessary for HTTPS connections. Be sure to edit your firewall rules to allow connections on the HTTPS port (443).

Voila! Now you have your very own open source chatbot in the cloud you can chat with whenever you'd like.

Reply

Description

Please enter an answer

Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Compute

Storage

Databases

Networking

Developer Tools

Delivery

Security

Services

Industries

Pricing

Community

Engage With Us