BCS Blog

COBOL, z/OS performance measurement, and systems programming updates.

Open Portal

What Is Ollama

By Arch Brooks on April 17, 2026 in Programming

Ollama is a tool that lets you run large language models (LLMs) locally on your machine instead of calling cloud APIs.


What it actually does

  • Downloads AI models (like LLaMA, Mistral, DeepSeek)
  • Runs them on your CPU/GPU
  • Exposes a simple CLI + REST API
  • No internet needed after models are pulled

Why people use it

  • Privacy → everything stays local
  • No API costs
  • Fast iteration for dev work
  • Works well for:
    • code generation
    • chatbots
    • local automation
    • embedding + RAG systems

Basic usage (what you care about)

 
# install (Arch example via AUR)
yay -S ollama

# start service
ollama serve

# run a model
ollama run llama3
 

Pull a specific model

 
ollama pull deepseek-coder
ollama run deepseek-coder
 

Call it from your app (like your Bootstrap modal app)

 
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Explain Mark 9:36-37"
}'
 

Models you can run

Common ones:

  • llama3 (general)
  • mistral (fast)
  • deepseek-coder (code)
  • qwen2.5-coder (strong coding)
  • phi (lightweight)

Important limitation

  • Runs smaller/open models, not things like GPT-5
  • Quality depends on:
    • your hardware (RAM/VRAM)
    • model size (7B, 13B, 70B, etc.)

Bottom line

Ollama = local AI engine
Think:

“Docker for LLMs running on my machine”


 


Views: 6 | Permalink: https://bcs.archman.us/blog.php?slug=what-is-ollama


Comments

No comments yet.