What Is Ollama
By Arch Brooks on April 17, 2026 in Programming
Ollama is a tool that lets you run large language models (LLMs) locally on your machine instead of calling cloud APIs.
What it actually does
- Downloads AI models (like LLaMA, Mistral, DeepSeek)
- Runs them on your CPU/GPU
- Exposes a simple CLI + REST API
- No internet needed after models are pulled
Why people use it
- Privacy → everything stays local
- No API costs
- Fast iteration for dev work
- Works well for:
- code generation
- chatbots
- local automation
- embedding + RAG systems
Basic usage (what you care about)
# install (Arch example via AUR)
yay -S ollama
# start service
ollama serve
# run a model
ollama run llama3
yay -S ollama
# start service
ollama serve
# run a model
ollama run llama3
Pull a specific model
ollama pull deepseek-coder
ollama run deepseek-coder
ollama run deepseek-coder
Call it from your app (like your Bootstrap modal app)
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Explain Mark 9:36-37"
}'
"model": "llama3",
"prompt": "Explain Mark 9:36-37"
}'
Models you can run
Common ones:
llama3(general)mistral(fast)deepseek-coder(code)qwen2.5-coder(strong coding)phi(lightweight)
Important limitation
- Runs smaller/open models, not things like GPT-5
- Quality depends on:
- your hardware (RAM/VRAM)
- model size (7B, 13B, 70B, etc.)
Bottom line
Ollama = local AI engine
Think:
“Docker for LLMs running on my machine”
Views: 6 | Permalink: https://bcs.archman.us/blog.php?slug=what-is-ollama