DeepSeek-V3 Setup

Use DeepSeek-V3 with agentful for massive cost savings (8-10x cheaper than Claude) while maintaining strong performance.

Why DeepSeek-V3?

Metric	DeepSeek-V3	Claude Sonnet 4.5	Savings
Input Cost	$0.27/M	$3.00/M	91% cheaper
Output Cost	$1.10/M	$15.00/M	93% cheaper
Cached Input	$0.014/M	$0.30/M	95% cheaper
Context Window	128K	200K	-36%
Reasoning Mode	✅ Built-in	✅ Built-in	Same
Function Calling	✅	✅	Same

Best For:

Cost-sensitive projects
High-volume operations
Agent-heavy workflows
Mathematical reasoning
Thinking/reasoning tasks

Quick Start (5 minutes)

Option 1: LiteLLM Proxy (Recommended)

# Install LiteLLM
pip install 'litellm[proxy]'
 
# Get DeepSeek API key from https://platform.deepseek.com
export DEEPSEEK_API_KEY=sk-...
 
# Start proxy
litellm --model deepseek/deepseek-chat --drop_params --port 4000
 
# Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$DEEPSEEK_API_KEY
 
claude
/agentful-start

Option 2: Direct API (Requires Proxy)

DeepSeek API is OpenAI-compatible, so you need a translation layer:

# Use OpenAI-compatible endpoint via LiteLLM
export DEEPSEEK_API_BASE=https://api.deepseek.com
export DEEPSEEK_API_KEY=sk-...
 
litellm \
  --model deepseek/deepseek-chat \
  --api_base $DEEPSEEK_API_BASE \
  --drop_params

Detailed Setup

Step 1: Get DeepSeek API Key

Visit platform.deepseek.com
Sign up / Log in
Navigate to API Keys
Create new key
Copy key (starts with sk-...)

Pricing (as of Jan 2025):

Input tokens: $0.27/M
Output tokens: $1.10/M
Cached input: $0.014/M (95% discount!)
Free tier: $5 credit for new users

Step 2: Install LiteLLM

Docker:

docker run -d \
  --name litellm-deepseek \
  --restart unless-stopped \
  -p 4000:4000 \
  -e DEEPSEEK_API_KEY=$DEEPSEEK_API_KEY \
  ghcr.io/berriai/litellm:main-latest \
  --model deepseek/deepseek-chat \
  --drop_params

pip:

pip install 'litellm[proxy]'

Step 3: Configure LiteLLM

Create litellm_config.yaml:

model_list:
  - model_name: deepseek-chat
    litellm_params:
      model: deepseek/deepseek-chat
      api_key: os.environ/DEEPSEEK_API_KEY
      drop_params: true
 
  - model_name: deepseek-reasoner
    litellm_params:
      model: deepseek/deepseek-reasoner
      api_key: os.environ/DEEPSEEK_API_KEY
      drop_params: true
 
litellm_settings:
  drop_params: true
  cache: true  # Enable prompt caching (95% savings!)

Start:

litellm --config litellm_config.yaml --port 4000

Step 4: Configure Claude Code

Persistent (~/.claude/settings.json):

{
  "environmentVariables": {
    "ANTHROPIC_BASE_URL": "http://localhost:4000",
    "ANTHROPIC_API_KEY": "your_deepseek_api_key",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "deepseek-chat",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "deepseek-chat",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-reasoner"
  }
}

Session-based:

export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$DEEPSEEK_API_KEY
claude

Step 5: Verify Setup

claude
 
You: What model are you?
Assistant: I am DeepSeek-V3, a large language model developed by DeepSeek.

Model Variants

deepseek-chat (Recommended)

Standard chat model

Context: 128K tokens
Output: 8K tokens
Strengths: General tasks, coding, analysis
Cost: $0.27/M input, $1.10/M output
Use for: 90% of tasks

model_name: deepseek-chat

deepseek-reasoner

Extended reasoning model (like o1)

Context: 128K tokens
Reasoning tokens: Uses extra tokens for "thinking"
Cost: $0.55/M input, $2.19/M output
Use for: Complex math, logic puzzles, algorithm design

Note: Reasoner model costs 2x more but still 84% cheaper than Claude.

model_name: deepseek-reasoner
litellm_params:
  model: deepseek/deepseek-reasoner
  drop_params: true

Advanced Features

Prompt Caching (95% Savings!)

DeepSeek offers aggressive caching discounts:

litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
    ttl: 3600  # 1 hour cache

Example savings:

First request: 50K tokens × $0.27/M = $0.0135
Cached request: 50K tokens × $0.014/M = $0.0007
Savings: 95% ($0.0128 saved per request)

Function Calling

DeepSeek supports native function calling:

# Works automatically with agentful agents
/agentful-start
 
# Agent uses tools seamlessly
[Agent: Orchestrator]
- Using tool: read_file
- Using tool: run_tests
- Using tool: git_commit

Reasoning Mode

Use deepseek-reasoner for complex tasks:

{
  "environmentVariables": {
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-reasoner"
  }
}

When to use reasoner:

Mathematical proofs
Algorithm optimization
Complex debugging
Multi-step planning

Cost comparison:

DeepSeek Reasoner: $0.55 input, $2.19 output (with thinking tokens)
Claude Opus: $15 input, $75 output
Savings: 96% cheaper

Integration with agentful

All Agents Work

/agentful-start  # Orchestrator
/agentful-product  # Architect
# All agents use DeepSeek automatically

Recommended Configuration

Best models per agent:

Agent	Model	Why
Orchestrator	deepseek-chat	Fast planning
Architect	deepseek-reasoner	Deep system design
Backend	deepseek-chat	Strong coding
Frontend	deepseek-chat	UI generation
Tester	deepseek-chat	Test creation
Reviewer	deepseek-chat	Code review
Fixer	deepseek-chat	Bug fixes

Cost example (1M token workflow):

With Claude Sonnet 4.5:
Input: 500K × $3/M = $1.50
Output: 500K × $15/M = $7.50
Total: $9.00
 
With DeepSeek:
Input: 500K × $0.27/M = $0.135
Output: 500K × $1.10/M = $0.55
Total: $0.685
 
Savings: $8.315 (92%)

Performance Comparison

Code Generation (SWE-bench)

Claude Sonnet 4.5:  77.2%
DeepSeek-V3:        72.4%
GPT-4o:             70.5%

-4.8% vs Claude, but 92% cheaper

Mathematical Reasoning (MATH-500)

DeepSeek-V3:        90.2%
Claude Sonnet 4.5:  87.0%
GPT-4o:             85.3%

+3.2% better than Claude!

Function Calling (Berkeley Function Calling)

Claude Sonnet 4.5:  92.1%
DeepSeek-V3:        89.7%
GPT-4o:             88.4%

-2.4% vs Claude, still excellent

Cost Optimization Strategies

1. Hybrid Approach

Use DeepSeek for 90% of tasks, Claude for critical 10%:

# Default to DeepSeek
export ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-chat
 
# Manual override for important code
claude --model claude-sonnet-4-5-20250929

Savings: ~82% overall

2. Enable Caching

litellm_settings:
  cache: true

Typical savings: 30-50% on repeated contexts

3. Use Reasoner Sparingly

# Use chat model by default
ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-chat
 
# Reserve reasoner for complex tasks
ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-reasoner

When to use reasoner:

Algorithm design: ✅
Simple CRUD: ❌
Math proofs: ✅
Basic refactoring: ❌

Troubleshooting

API Rate Limits

# DeepSeek limits (per minute):
# - Free tier: 60 RPM, 500K TPM
# - Paid tier: 200 RPM, 2M TPM
 
# Add retry logic in litellm_config.yaml:
litellm_settings:
  num_retries: 3
  timeout: 600

Quality Issues

# If output quality is lower than expected:
 
# 1. Use reasoner model for complex tasks
export ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-reasoner
 
# 2. Add more context/examples in prompts
# 3. Use temperature=0 for deterministic output
 
# 4. Switch to Claude for critical code
export ANTHROPIC_BASE_URL=https://api.anthropic.com
export ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY

Connection Errors

# Test DeepSeek API directly
curl https://api.deepseek.com/v1/models \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY"
 
# Check LiteLLM logs
litellm --model deepseek/deepseek-chat --debug

When to Use DeepSeek vs Claude

Use DeepSeek for:

✅ High-volume operations (tests, documentation)
✅ Mathematical/algorithmic tasks
✅ Cost-sensitive projects
✅ Experimentation/prototyping
✅ Agent-heavy workflows

Use Claude for:

✅ Production-critical endpoints
✅ Complex instruction following
✅ Security-sensitive code
✅ Long context (200K vs 128K)
✅ Best-in-class function calling

Hybrid Strategy:

# 70% DeepSeek, 30% Claude
# Average cost: ~$1.08/M input, ~$5.30/M output
# Savings: 64% vs pure Claude

Production Deployment

Docker Compose

services:
  litellm-deepseek:
    image: ghcr.io/berriai/litellm:main-latest
    restart: unless-stopped
    ports:
      - "4000:4000"
    environment:
      DEEPSEEK_API_KEY: ${DEEPSEEK_API_KEY}
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    command: ["--config", "/app/config.yaml"]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Monitoring

# View usage/costs
curl http://localhost:4000/spend/tags
 
# Response:
{
  "deepseek-chat": {
    "total_cost": 0.685,
    "total_tokens": 1000000
  }
}

Resources

DeepSeek Platform: https://platform.deepseek.com
Documentation: https://api-docs.deepseek.com
Pricing: https://platform.deepseek.com/api-docs/pricing
LiteLLM Docs: https://docs.litellm.ai
Model Card: https://huggingface.co/deepseek-ai/DeepSeek-V3

Next Steps

Try OpenAI models for better instruction following
Configure GLM-4.7 for even cheaper alternative
Run local models for complete privacy
Learn cost optimization