Open-Source Large Language Models with Ollama

Published 2024-01-14. Last modified 2025-09-20.
Time to read: 5 minutes.

This page is part of the llm collection.

I've been playing with large language models (LLMs) online and locally. LLMs running on my local machines are not as powerful or as fast as large models running on expensive hardware, but you have complete control over them without extra cost, censorship, restrictions, or privacy issues.

Ollama is a way to run large language models (LLMs) locally, using a client-server architecture. Ollama wraps LLMs into a server, and clients can interrogate the server.

Ollama is an open-source tool built in Go for running and packaging generative machine learning models. Ollama clients can include:

Program code via the Ollama server REST interface
Text chat
Web interface

Any Ollama client can access any Ollama server.

Installation

Installation instructions are simple.

On Windows with WSL, install Ollama for Linux in WSL. Linux installation and update looks like this:

Shell

$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink '/etc/systemd/system/default.target.wants/ollama.service'  '/etc/systemd/system/ollama.service'.
>>> NVIDIA GPU installed.

Ollama Model Format vs. Hugging Face Model Format

GGUF (GPT-Generated Unified Format) is a file format for storing large language and other AI models, optimized for fast loading and efficient, quantized inference on local hardware. It bundles model metadata and tensors into a single binary file, supports various quantization levels to reduce memory usage.

Although Ollama can directly use any GGUF-formatted model, caveats exist. Your first source of Ollama-compatible models should be ollama.com.

Hugging Face provides models in their own format, plus some of their models also are available in GGUF format. Ollama does not support LLM models in the Hugging Face format. Conversion to GGUF format can take a long time, and might require an understanding of the moving parts.

Most models found on Hugging Face were originally released in Pytorch / tensor format and then converted to GGUF. The conversion can mess up some parameters. This is why your primary source for models should be ollama.com.

Installing a LLAMA-Compatible Model

To install or update a model without running it, type ollama pull, followed by the name of the model.

Ollama default model is Q4 (4-bit quantized), which is faster but can be much less accurate than Q8 (8-bit quantization) models. Install Q8 versions if possible.

Shell

$ ollama pull deepseek-r1:8b # install or update
pulling manifest
pulling e6a7edc1a4d7: 100% ▕████████████████████████████ ▏ 5.2 GB/5.2 GB 63 MB/s 0s
pulling c5ad996bda6e: 100% ▕████████████████████████████▏ 556 B
pulling 6e4c38e1172f: 100% ▕████████████████████████████▏ 1.1 KB
pulling ed8474dc73db: 100% ▕████████████████████████████▏ 179 B
pulling f64cd5418e4b: 100% ▕████████████████████████████▏ 487 B
verifying sha256 digest
writing manifest
success

You can install and run any LLAMA-compatible model by typing ollama run, followed by the name of the model.

Shell

$ ollama run deepseek-r1:8b

Inspecting a Model

Inspect an installed model:

Shell

$ ollama show deepseek-r1:8b
Model
    architecture        qwen3
    parameters          8.2B
    context length      131072
    embedding length    4096
    quantization        Q4_K_M

  Capabilities
    completion
    thinking

  Parameters
    stop           "<|begin?of?sentence|>"
    stop           "<|end?of?sentence|>"
    stop           "<|User|>"
    stop           "<|Assistant|>"
    temperature    0.6
    top_p          0.95

  License
    MIT License
    Copyright (c) 2023 DeepSeek
    ...

Just display the quantization:

Shell

$ ollama show deepseek-r1:8b | grep quantization
quantization Q4_K_M

My Favorite Models

Following are some example open-source models that I have downloaded and played with on my PC. Larger versions could be run in the cloud with providers like ShadowPC and AWS spot pricing.

Model	Parameters	Purpose
codellama:7b	7B	Old but good: General code synthesis and understanding using Llama 2.
deepseek-r1:8b	8B, Q4	Uses Qwen architecture, best for math-focused tasks with resource constraints. Outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Performance is similar to O3 and Gemini 2.5 Pro.
llama3:8b	8B	Llama 3 is very capable.
luna-ai-llama2-uncensored-gguf	7B
llama2:13b	13B
llama-3.1-8b-instruct	8B
mistral	7B
mistral-small3.2	24B

Each model has unique attributes. Some are designed for describing images, while others are designed for generating music, or other special purposes.

The 70B parameter model really puts a strain on the computer, and takes much longer than other models to yield a result.

Installation

I installed Ollama on WSL like this:

Shell

$ curl -s https://ollama.ai/install.sh | sh
>>> Downloading ollama...
################################################################# 100.0%#=#=#
################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service →
/etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.

Command Line Start

You can start the server from the command line, if it is not already running as a service:

Shell

$ ollama serve
2024/01/14 16:25:20 images.go:808: total blobs: 0
2024/01/14 16:25:20 images.go:815: total unused blobs removed: 0
2024/01/14 16:25:20 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
2024/01/14 16:25:21 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/14 16:25:21 gpu.go:88: Detecting GPU type
2024/01/14 16:25:21 gpu.go:203: Searching for GPU management library libnvidia-ml.so
2024/01/14 16:25:21 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1]
2024/01/14 16:25:21 gpu.go:94: Nvidia GPU detected
2024/01/14 16:25:21 gpu.go:135: CUDA Compute Capability detected: 8.6

Ollama Models

Ollama uses models on demand; the models are ignored if no queries are active. That means you do not have to restart ollama after installing a new model or removing an existing model.

My workstation has 64 GB RAM, a 13th generation Intel i7 and a modest NVIDIA 3060. I decided to try the biggest model to see what might happen. I downloaded the Llama 2 70B model with the following incantation. (Spoiler: An NVIDIA 4090 would have been better video card for this Ollama model, and it would still be slow.)

Shell

$ ollama run llama2:70b
pulling manifest
pulling 68bbe6dc9cf4... 100% ▕████████████████████████████████████▏  38 GB
pulling 8c17c2ebb0ea... 100% ▕████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕████████████████████████████████████▏   59 B
pulling fa304d675061... 100% ▕████████████████████████████████████▏   91 B
pulling 7c96b46dca6c... 100% ▕████████████████████████████████████▏  558 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help)

I played around to learn what the available messages were. For more information, see Tutorial: Set Session System Message in Ollama CLI by Ingrid Stevens.

Ollama messages (continued)

>>> /?
Available Commands:
  /set          Set session variables
  /show         Show model information
  /bye          Exit
  /?, /help     Help for a command
  /? shortcuts  Help for keyboard shortcuts

Use """ to begin a multi-line message.

>>> Send a message (/? for help)
>>> /show
Available Commands:
  /show info         Show details for this model
  /show license      Show model license
  /show modelfile    Show Modelfile for this model
  /show parameters   Show parameters for this model
  /show system       Show system message
  /show template     Show prompt template

>>> /show modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama2:70b

FROM /usr/share/ollama/.ollama/models/blobs/sha256:68bbe6dc9cf42eb60c9a7f96137fb8d472f752de6ebf53e9942f267f1a1e2577
TEMPLATE """[INST] <<SYS>>{{ .System }}<</SYS>>

{{ .Prompt }} [/INST]
"""
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"
PARAMETER stop "<<SYS>>"
>>> /show system
No system message was specified for this model.

>>> /show template
[INST] <<SYS>>{{ .System }}<</SYS>>

{{ .Prompt }} [/INST]
>>>  %}/bye

USER: and ASSISTANT: are helpful when writing a request for the model to reply to.

By default, Ollama models are stored in these directories:

Linux: /usr/share/ollama/.ollama/models
macOS: ~/.ollama/models

The Ollama library has many models available. OllamaHub has more. For applications that may not be safe for work, there is an equivalent uncensored Llama2 70B model that can be downloaded. Do not try to work with this model unless you have a really powerful machine!

Shell

$ ollama pull llama2-uncensored:70b
pulling manifest
pulling abca3de387b6... 100% ▕█████████████████████████████████████▏  38 GB
pulling 9224016baa40... 100% ▕█████████████████████████████████████▏ 7.0 KB
pulling 1195ea171610... 100% ▕█████████████████████████████████████▏ 4.8 KB
pulling 28577ba2177f... 100% ▕█████████████████████████████████████▏   55 B
pulling ddaa351c1f3d... 100% ▕█████████████████████████████████████▏   51 B
pulling 9256cd2888b0... 100% ▕█████████████████████████████████████▏  530 B
verifying sha256 digest
writing manifest
removing any unused layers
success

I then listed the models on my computer in another console:

Shell

$ ollama list
NAME            ID              SIZE    MODIFIED
llama2:70b      e7f6c06ffef4    38 GB   9 minutes ago

Running Queries

Ollama queries can be run in many ways

curl
oterm
web-ui
Computer language bindings

I used curl, jq and fold to write my first query from a bash prompt. The -s option for curl prevents the progress meter from cluttering up the screen, and the jq filter removes everything from the response except the desired text. The fold command wraps the text response to a width of 72 characters.

Shell

$ curl -s http://localhost:11434/api/generate -d '{
  "model":  "llama2:70b",
  "prompt": "Why is there air?",
  "stream": false
}' | jq -r .response | fold -w 72 -s
Air, or more specifically oxygen, is essential for life as we know it.
It exists because of the delicate balance of chemical reactions in
Earth’s atmosphere, which has allowed complex organisms like
ourselves to evolve.

But if you’re asking about air in a broader sense, it serves many
functions: it helps maintain a stable climate, protects living things
from harmful solar radiation, and provides buoyancy for various forms
of life, such as fish or birds.

Describing Images

I wrote this method to describe images.

Ruby

def describe_image(image_filename)
  @client = Ollama.new(
    credentials: { address: @address },
    options:     {
      server_sent_events: true,
      temperature:        @temperature,
      connection:         { request: { timeout: @timeout, read_timeout: @timeout } },
    }
  )
  result = @client.generate(
    {
      model:  @model,
      prompt: 'Please describe this image.',
      images: [Base64.strict_encode64(File.read(image_filename))],
    }
  )
  puts result.map { |x| x['response'] }.join
end

The results were ridiculous - an example of the famous hallucination that LLMs entertain their audience with. As the public becomes enculturated with these hallucinations, we may come to prefer them over human comedians. Certainly there will be a lot of material for the human comedians to fight back with. For example, when describing the photo of me at the top of this page:

This is an image of a vibrant and colorful sunrise over the ocean, with the sun peeking above the horizon, casting warm, golden hues over the sky and water below. The sunlight reflects off the rippled surface of the water, creating shimmering patterns that contrast with the tranquil darkness of the receding waters. In the foreground, a solitary figure is silhouetted against the rising sun, perhaps lost in thought or finding inspiration in the breathtaking beauty of the scene.

Another attempt, with an equally ridiculous result:

The photograph depicts an intricate pattern of geometric shapes and lines, creating an abstract design that appears to be in motion. The colors used, including vibrant shades of blue, purple, and red, add energy and dynamism to the piece. It has a sense of fluidity and movement which evokes a feeling of excitement or anticipation. The artwork's abstract nature allows for multiple interpretations, leaving room for personal perspectives and emotions that each viewer may associate with it.

The llava model is supposed to be good at describing images, so I installed it and tried again, with excellent results:

Shell

$ ollama pull llava:13b

describe -m llava:13b /mnt/c/sites/photos/me/bestPhotoOfMike.png The image features a smiling man wearing glasses and dressed in a suit and tie. He has a well-groomed appearance. The man's attire includes a jacket, dress shirt, and a patterned tie that complements his professional outfit. The setting appears to be a studio environment, as there is a background behind the man that has an evenly lit texture. The man's smile conveys confidence and approachability, making him appear knowledgeable in his field or simply happy to pose for this photograph.

You can try the latest LLaVA model online.

Documentation

Mainframe image; Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License by PekoeBlaze

© Copyright 1994-2025 Michael Slinn. All rights reserved.
For requests to use this copyright-protected work in any manner, email mslinn@mslinn.com.

This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.

Installation

Ollama Model Format vs. Hugging Face Model Format

Installing a LLAMA-Compatible Model

Inspecting a Model

My Favorite Models

Installation

Command Line Start

Ollama Models

Running Queries

Describing Images

Documentation

Related Projects