Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

DeepSeek-V3 Setup

Use DeepSeek-V3 with agentful for massive cost savings (8-10x cheaper than Claude) while maintaining strong performance.

Why DeepSeek-V3?

MetricDeepSeek-V3Claude Sonnet 4.5Savings
Input Cost$0.27/M$3.00/M91% cheaper
Output Cost$1.10/M$15.00/M93% cheaper
Cached Input$0.014/M$0.30/M95% cheaper
Context Window128K200K-36%
Reasoning Mode✅ Built-in✅ Built-inSame
Function CallingSame
Best For:
  • Cost-sensitive projects
  • High-volume operations
  • Agent-heavy workflows
  • Mathematical reasoning
  • Thinking/reasoning tasks

Quick Start (5 minutes)

Option 1: LiteLLM Proxy (Recommended)

# Install LiteLLM
pip install 'litellm[proxy]'
 
# Get DeepSeek API key from https://platform.deepseek.com
export DEEPSEEK_API_KEY=sk-...
 
# Start proxy
litellm --model deepseek/deepseek-chat --drop_params --port 4000
 
# Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$DEEPSEEK_API_KEY
 
claude
/agentful-start

Option 2: Direct API (Requires Proxy)

DeepSeek API is OpenAI-compatible, so you need a translation layer:

# Use OpenAI-compatible endpoint via LiteLLM
export DEEPSEEK_API_BASE=https://api.deepseek.com
export DEEPSEEK_API_KEY=sk-...
 
litellm \
  --model deepseek/deepseek-chat \
  --api_base $DEEPSEEK_API_BASE \
  --drop_params

Detailed Setup

Step 1: Get DeepSeek API Key

  1. Visit platform.deepseek.com
  2. Sign up / Log in
  3. Navigate to API Keys
  4. Create new key
  5. Copy key (starts with sk-...)

Pricing (as of Jan 2025):

  • Input tokens: $0.27/M
  • Output tokens: $1.10/M
  • Cached input: $0.014/M (95% discount!)
  • Free tier: $5 credit for new users

Step 2: Install LiteLLM

Docker:
docker run -d \
  --name litellm-deepseek \
  --restart unless-stopped \
  -p 4000:4000 \
  -e DEEPSEEK_API_KEY=$DEEPSEEK_API_KEY \
  ghcr.io/berriai/litellm:main-latest \
  --model deepseek/deepseek-chat \
  --drop_params
pip:
pip install 'litellm[proxy]'

Step 3: Configure LiteLLM

Create litellm_config.yaml:

model_list:
  - model_name: deepseek-chat
    litellm_params:
      model: deepseek/deepseek-chat
      api_key: os.environ/DEEPSEEK_API_KEY
      drop_params: true
 
  - model_name: deepseek-reasoner
    litellm_params:
      model: deepseek/deepseek-reasoner
      api_key: os.environ/DEEPSEEK_API_KEY
      drop_params: true
 
litellm_settings:
  drop_params: true
  cache: true  # Enable prompt caching (95% savings!)

Start:

litellm --config litellm_config.yaml --port 4000

Step 4: Configure Claude Code

Persistent (~/.claude/settings.json):

{
  "environmentVariables": {
    "ANTHROPIC_BASE_URL": "http://localhost:4000",
    "ANTHROPIC_API_KEY": "your_deepseek_api_key",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "deepseek-chat",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "deepseek-chat",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-reasoner"
  }
}
Session-based:
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$DEEPSEEK_API_KEY
claude

Step 5: Verify Setup

claude
 
You: What model are you?
Assistant: I am DeepSeek-V3, a large language model developed by DeepSeek.

Model Variants

deepseek-chat (Recommended)

Standard chat model
  • Context: 128K tokens
  • Output: 8K tokens
  • Strengths: General tasks, coding, analysis
  • Cost: $0.27/M input, $1.10/M output
  • Use for: 90% of tasks
model_name: deepseek-chat

deepseek-reasoner

Extended reasoning model (like o1)

  • Context: 128K tokens
  • Reasoning tokens: Uses extra tokens for "thinking"
  • Cost: $0.55/M input, $2.19/M output
  • Use for: Complex math, logic puzzles, algorithm design

Note: Reasoner model costs 2x more but still 84% cheaper than Claude.

model_name: deepseek-reasoner
litellm_params:
  model: deepseek/deepseek-reasoner
  drop_params: true

Advanced Features

Prompt Caching (95% Savings!)

DeepSeek offers aggressive caching discounts:

litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
    ttl: 3600  # 1 hour cache
Example savings:
First request: 50K tokens × $0.27/M = $0.0135
Cached request: 50K tokens × $0.014/M = $0.0007
Savings: 95% ($0.0128 saved per request)

Function Calling

DeepSeek supports native function calling:

# Works automatically with agentful agents
/agentful-start
 
# Agent uses tools seamlessly
[Agent: Orchestrator]
- Using tool: read_file
- Using tool: run_tests
- Using tool: git_commit

Reasoning Mode

Use deepseek-reasoner for complex tasks:

{
  "environmentVariables": {
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-reasoner"
  }
}
When to use reasoner:
  • Mathematical proofs
  • Algorithm optimization
  • Complex debugging
  • Multi-step planning
Cost comparison:
DeepSeek Reasoner: $0.55 input, $2.19 output (with thinking tokens)
Claude Opus: $15 input, $75 output
Savings: 96% cheaper

Integration with agentful

All Agents Work

/agentful-start  # Orchestrator
/agentful-product  # Architect
# All agents use DeepSeek automatically

Recommended Configuration

Best models per agent:
AgentModelWhy
Orchestratordeepseek-chatFast planning
Architectdeepseek-reasonerDeep system design
Backenddeepseek-chatStrong coding
Frontenddeepseek-chatUI generation
Testerdeepseek-chatTest creation
Reviewerdeepseek-chatCode review
Fixerdeepseek-chatBug fixes

Cost example (1M token workflow):

With Claude Sonnet 4.5:
Input: 500K × $3/M = $1.50
Output: 500K × $15/M = $7.50
Total: $9.00
 
With DeepSeek:
Input: 500K × $0.27/M = $0.135
Output: 500K × $1.10/M = $0.55
Total: $0.685
 
Savings: $8.315 (92%)

Performance Comparison

Code Generation (SWE-bench)

Claude Sonnet 4.5:  77.2%
DeepSeek-V3:        72.4%
GPT-4o:             70.5%
-4.8% vs Claude, but 92% cheaper

Mathematical Reasoning (MATH-500)

DeepSeek-V3:        90.2%
Claude Sonnet 4.5:  87.0%
GPT-4o:             85.3%
+3.2% better than Claude!

Function Calling (Berkeley Function Calling)

Claude Sonnet 4.5:  92.1%
DeepSeek-V3:        89.7%
GPT-4o:             88.4%
-2.4% vs Claude, still excellent

Cost Optimization Strategies

1. Hybrid Approach

Use DeepSeek for 90% of tasks, Claude for critical 10%:

# Default to DeepSeek
export ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-chat
 
# Manual override for important code
claude --model claude-sonnet-4-5-20250929

Savings: ~82% overall

2. Enable Caching

litellm_settings:
  cache: true

Typical savings: 30-50% on repeated contexts

3. Use Reasoner Sparingly

# Use chat model by default
ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-chat
 
# Reserve reasoner for complex tasks
ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-reasoner
When to use reasoner:
  • Algorithm design: ✅
  • Simple CRUD: ❌
  • Math proofs: ✅
  • Basic refactoring: ❌

Troubleshooting

API Rate Limits

# DeepSeek limits (per minute):
# - Free tier: 60 RPM, 500K TPM
# - Paid tier: 200 RPM, 2M TPM
 
# Add retry logic in litellm_config.yaml:
litellm_settings:
  num_retries: 3
  timeout: 600

Quality Issues

# If output quality is lower than expected:
 
# 1. Use reasoner model for complex tasks
export ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-reasoner
 
# 2. Add more context/examples in prompts
# 3. Use temperature=0 for deterministic output
 
# 4. Switch to Claude for critical code
export ANTHROPIC_BASE_URL=https://api.anthropic.com
export ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY

Connection Errors

# Test DeepSeek API directly
curl https://api.deepseek.com/v1/models \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY"
 
# Check LiteLLM logs
litellm --model deepseek/deepseek-chat --debug

When to Use DeepSeek vs Claude

Use DeepSeek for:

  • ✅ High-volume operations (tests, documentation)
  • ✅ Mathematical/algorithmic tasks
  • ✅ Cost-sensitive projects
  • ✅ Experimentation/prototyping
  • ✅ Agent-heavy workflows

Use Claude for:

  • ✅ Production-critical endpoints
  • ✅ Complex instruction following
  • ✅ Security-sensitive code
  • ✅ Long context (200K vs 128K)
  • ✅ Best-in-class function calling

Hybrid Strategy:

# 70% DeepSeek, 30% Claude
# Average cost: ~$1.08/M input, ~$5.30/M output
# Savings: 64% vs pure Claude

Production Deployment

Docker Compose

services:
  litellm-deepseek:
    image: ghcr.io/berriai/litellm:main-latest
    restart: unless-stopped
    ports:
      - "4000:4000"
    environment:
      DEEPSEEK_API_KEY: ${DEEPSEEK_API_KEY}
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    command: ["--config", "/app/config.yaml"]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Monitoring

# View usage/costs
curl http://localhost:4000/spend/tags
 
# Response:
{
  "deepseek-chat": {
    "total_cost": 0.685,
    "total_tokens": 1000000
  }
}

Resources


Next Steps