Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

Local Models (Ollama & LM Studio)

Run models locally - zero API costs, complete privacy.


Ollama (Recommended)

Installation

# macOS
brew install --cask ollama
 
# Linux
curl -fsSL https://ollama.com/install.sh | sh
 
# Windows
# Download from https://ollama.com/download/windows

Setup with agentful

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# 2. Pull a model
ollama pull qwen2.5-coder:7b
 
# 3. Configure Claude Code
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434
 
# 4. Run
claude
/agentful-start

Recommended Models

# Best for coding (start here)
ollama pull qwen2.5-coder:7b     # 6GB VRAM
ollama pull qwen2.5-coder:14b    # 12GB VRAM
ollama pull qwen2.5-coder:32b    # 24GB VRAM
 
# Best for tool calling
ollama pull llama3.1:8b          # 6GB VRAM

Advanced Config

Increase context window:
# Create Modelfile
FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768
PARAMETER temperature 0.7
 
# Apply
ollama create qwen-32k -f ./Modelfile
claude --model qwen-32k
Persistent config:
# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

LM Studio

Installation

Download from https://lmstudio.ai (macOS, Windows, Linux)

Setup with agentful

LM Studio requires LiteLLM proxy (Ollama doesn't):

# 1. Install LiteLLM
pip install 'litellm[proxy]'
 
# 2. Create config
cat > litellm-config.yaml <<EOF
model_list:
  - model_name: claude-sonnet-4-5
    litellm_params:
      model: openai/qwen2.5-coder-7b
      api_base: http://localhost:1234/v1
      api_key: dummy
EOF
 
# 3. Start LM Studio server (port 1234)
 
# 4. Start LiteLLM proxy
litellm --config litellm-config.yaml --port 4000
 
# 5. Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000/anthropic
export ANTHROPIC_AUTH_TOKEN=sk-1234
 
# 6. Run
claude
Why the extra step?
  • LM Studio uses OpenAI API format
  • Claude Code uses Anthropic API format
  • LiteLLM translates between them
  • Ollama has native Anthropic support (easier)

Troubleshooting

Ollama

Model not found:
ollama pull model-name
ollama list
Out of memory:
# Use smaller model
ollama pull qwen2.5-coder:7b-q4_K_M
Slow inference:
# Check GPU usage
nvidia-smi
ollama ps
Context too long:
FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768

LM Studio

  • Verify server is running (Local Server tab)
  • Check port 1234 is accessible
  • Ensure LiteLLM proxy is running

Resources