Local Models (Ollama & LM Studio)

Run models locally - zero API costs, complete privacy.

Ollama (Recommended)

Installation

# macOS
brew install --cask ollama
 
# Linux
curl -fsSL https://ollama.com/install.sh | sh
 
# Windows
# Download from https://ollama.com/download/windows

Setup with agentful

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# 2. Pull a model
ollama pull qwen2.5-coder:7b
 
# 3. Configure Claude Code
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434
 
# 4. Run
claude
/agentful-start

Recommended Models

# Best for coding (start here)
ollama pull qwen2.5-coder:7b     # 6GB VRAM
ollama pull qwen2.5-coder:14b    # 12GB VRAM
ollama pull qwen2.5-coder:32b    # 24GB VRAM
 
# Best for tool calling
ollama pull llama3.1:8b          # 6GB VRAM

Advanced Config

Increase context window:

# Create Modelfile
FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768
PARAMETER temperature 0.7
 
# Apply
ollama create qwen-32k -f ./Modelfile
claude --model qwen-32k

Persistent config:

# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

LM Studio

Installation

Download from https://lmstudio.ai (macOS, Windows, Linux)

Setup with agentful

LM Studio requires LiteLLM proxy (Ollama doesn't):

# 1. Install LiteLLM
pip install 'litellm[proxy]'
 
# 2. Create config
cat > litellm-config.yaml <<EOF
model_list:
  - model_name: claude-sonnet-4-5
    litellm_params:
      model: openai/qwen2.5-coder-7b
      api_base: http://localhost:1234/v1
      api_key: dummy
EOF
 
# 3. Start LM Studio server (port 1234)
 
# 4. Start LiteLLM proxy
litellm --config litellm-config.yaml --port 4000
 
# 5. Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000/anthropic
export ANTHROPIC_AUTH_TOKEN=sk-1234
 
# 6. Run
claude

Why the extra step?

LM Studio uses OpenAI API format
Claude Code uses Anthropic API format
LiteLLM translates between them
Ollama has native Anthropic support (easier)

Troubleshooting

Ollama

Model not found:

ollama pull model-name
ollama list

Out of memory:

# Use smaller model
ollama pull qwen2.5-coder:7b-q4_K_M

Slow inference:

# Check GPU usage
nvidia-smi
ollama ps

Context too long:

FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768

LM Studio

Verify server is running (Local Server tab)
Check port 1234 is accessible
Ensure LiteLLM proxy is running

Ollama (Recommended)

Installation

Setup with agentful

Recommended Models

Advanced Config

LM Studio

Installation

Setup with agentful

Troubleshooting

Ollama

LM Studio

Resources