DeepSeek-V3 Setup
Use DeepSeek-V3 with agentful for massive cost savings (8-10x cheaper than Claude) while maintaining strong performance.
Why DeepSeek-V3?
| Metric | DeepSeek-V3 | Claude Sonnet 4.5 | Savings |
|---|---|---|---|
| Input Cost | $0.27/M | $3.00/M | 91% cheaper |
| Output Cost | $1.10/M | $15.00/M | 93% cheaper |
| Cached Input | $0.014/M | $0.30/M | 95% cheaper |
| Context Window | 128K | 200K | -36% |
| Reasoning Mode | ✅ Built-in | ✅ Built-in | Same |
| Function Calling | ✅ | ✅ | Same |
- Cost-sensitive projects
- High-volume operations
- Agent-heavy workflows
- Mathematical reasoning
- Thinking/reasoning tasks
Quick Start (5 minutes)
Option 1: LiteLLM Proxy (Recommended)
# Install LiteLLM
pip install 'litellm[proxy]'
# Get DeepSeek API key from https://platform.deepseek.com
export DEEPSEEK_API_KEY=sk-...
# Start proxy
litellm --model deepseek/deepseek-chat --drop_params --port 4000
# Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$DEEPSEEK_API_KEY
claude
/agentful-startOption 2: Direct API (Requires Proxy)
DeepSeek API is OpenAI-compatible, so you need a translation layer:
# Use OpenAI-compatible endpoint via LiteLLM
export DEEPSEEK_API_BASE=https://api.deepseek.com
export DEEPSEEK_API_KEY=sk-...
litellm \
--model deepseek/deepseek-chat \
--api_base $DEEPSEEK_API_BASE \
--drop_paramsDetailed Setup
Step 1: Get DeepSeek API Key
- Visit platform.deepseek.com
- Sign up / Log in
- Navigate to API Keys
- Create new key
- Copy key (starts with
sk-...)
Pricing (as of Jan 2025):
- Input tokens: $0.27/M
- Output tokens: $1.10/M
- Cached input: $0.014/M (95% discount!)
- Free tier: $5 credit for new users
Step 2: Install LiteLLM
Docker:docker run -d \
--name litellm-deepseek \
--restart unless-stopped \
-p 4000:4000 \
-e DEEPSEEK_API_KEY=$DEEPSEEK_API_KEY \
ghcr.io/berriai/litellm:main-latest \
--model deepseek/deepseek-chat \
--drop_paramspip install 'litellm[proxy]'Step 3: Configure LiteLLM
Create litellm_config.yaml:
model_list:
- model_name: deepseek-chat
litellm_params:
model: deepseek/deepseek-chat
api_key: os.environ/DEEPSEEK_API_KEY
drop_params: true
- model_name: deepseek-reasoner
litellm_params:
model: deepseek/deepseek-reasoner
api_key: os.environ/DEEPSEEK_API_KEY
drop_params: true
litellm_settings:
drop_params: true
cache: true # Enable prompt caching (95% savings!)Start:
litellm --config litellm_config.yaml --port 4000Step 4: Configure Claude Code
Persistent (~/.claude/settings.json):
{
"environmentVariables": {
"ANTHROPIC_BASE_URL": "http://localhost:4000",
"ANTHROPIC_API_KEY": "your_deepseek_api_key",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "deepseek-chat",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "deepseek-chat",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-reasoner"
}
}export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$DEEPSEEK_API_KEY
claudeStep 5: Verify Setup
claude
You: What model are you?
Assistant: I am DeepSeek-V3, a large language model developed by DeepSeek.Model Variants
deepseek-chat (Recommended)
Standard chat model- Context: 128K tokens
- Output: 8K tokens
- Strengths: General tasks, coding, analysis
- Cost: $0.27/M input, $1.10/M output
- Use for: 90% of tasks
model_name: deepseek-chatdeepseek-reasoner
Extended reasoning model (like o1)
- Context: 128K tokens
- Reasoning tokens: Uses extra tokens for "thinking"
- Cost: $0.55/M input, $2.19/M output
- Use for: Complex math, logic puzzles, algorithm design
Note: Reasoner model costs 2x more but still 84% cheaper than Claude.
model_name: deepseek-reasoner
litellm_params:
model: deepseek/deepseek-reasoner
drop_params: trueAdvanced Features
Prompt Caching (95% Savings!)
DeepSeek offers aggressive caching discounts:
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600 # 1 hour cacheFirst request: 50K tokens × $0.27/M = $0.0135
Cached request: 50K tokens × $0.014/M = $0.0007
Savings: 95% ($0.0128 saved per request)Function Calling
DeepSeek supports native function calling:
# Works automatically with agentful agents
/agentful-start
# Agent uses tools seamlessly
[Agent: Orchestrator]
- Using tool: read_file
- Using tool: run_tests
- Using tool: git_commitReasoning Mode
Use deepseek-reasoner for complex tasks:
{
"environmentVariables": {
"ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-reasoner"
}
}- Mathematical proofs
- Algorithm optimization
- Complex debugging
- Multi-step planning
DeepSeek Reasoner: $0.55 input, $2.19 output (with thinking tokens)
Claude Opus: $15 input, $75 output
Savings: 96% cheaperIntegration with agentful
All Agents Work
/agentful-start # Orchestrator
/agentful-product # Architect
# All agents use DeepSeek automaticallyRecommended Configuration
Best models per agent:| Agent | Model | Why |
|---|---|---|
| Orchestrator | deepseek-chat | Fast planning |
| Architect | deepseek-reasoner | Deep system design |
| Backend | deepseek-chat | Strong coding |
| Frontend | deepseek-chat | UI generation |
| Tester | deepseek-chat | Test creation |
| Reviewer | deepseek-chat | Code review |
| Fixer | deepseek-chat | Bug fixes |
Cost example (1M token workflow):
With Claude Sonnet 4.5:
Input: 500K × $3/M = $1.50
Output: 500K × $15/M = $7.50
Total: $9.00
With DeepSeek:
Input: 500K × $0.27/M = $0.135
Output: 500K × $1.10/M = $0.55
Total: $0.685
Savings: $8.315 (92%)Performance Comparison
Code Generation (SWE-bench)
Claude Sonnet 4.5: 77.2%
DeepSeek-V3: 72.4%
GPT-4o: 70.5%Mathematical Reasoning (MATH-500)
DeepSeek-V3: 90.2%
Claude Sonnet 4.5: 87.0%
GPT-4o: 85.3%Function Calling (Berkeley Function Calling)
Claude Sonnet 4.5: 92.1%
DeepSeek-V3: 89.7%
GPT-4o: 88.4%Cost Optimization Strategies
1. Hybrid Approach
Use DeepSeek for 90% of tasks, Claude for critical 10%:
# Default to DeepSeek
export ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-chat
# Manual override for important code
claude --model claude-sonnet-4-5-20250929Savings: ~82% overall
2. Enable Caching
litellm_settings:
cache: trueTypical savings: 30-50% on repeated contexts
3. Use Reasoner Sparingly
# Use chat model by default
ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-chat
# Reserve reasoner for complex tasks
ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-reasoner- Algorithm design: ✅
- Simple CRUD: ❌
- Math proofs: ✅
- Basic refactoring: ❌
Troubleshooting
API Rate Limits
# DeepSeek limits (per minute):
# - Free tier: 60 RPM, 500K TPM
# - Paid tier: 200 RPM, 2M TPM
# Add retry logic in litellm_config.yaml:
litellm_settings:
num_retries: 3
timeout: 600Quality Issues
# If output quality is lower than expected:
# 1. Use reasoner model for complex tasks
export ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-reasoner
# 2. Add more context/examples in prompts
# 3. Use temperature=0 for deterministic output
# 4. Switch to Claude for critical code
export ANTHROPIC_BASE_URL=https://api.anthropic.com
export ANTHROPIC_API_KEY=$ANTHROPIC_API_KEYConnection Errors
# Test DeepSeek API directly
curl https://api.deepseek.com/v1/models \
-H "Authorization: Bearer $DEEPSEEK_API_KEY"
# Check LiteLLM logs
litellm --model deepseek/deepseek-chat --debugWhen to Use DeepSeek vs Claude
Use DeepSeek for:
- ✅ High-volume operations (tests, documentation)
- ✅ Mathematical/algorithmic tasks
- ✅ Cost-sensitive projects
- ✅ Experimentation/prototyping
- ✅ Agent-heavy workflows
Use Claude for:
- ✅ Production-critical endpoints
- ✅ Complex instruction following
- ✅ Security-sensitive code
- ✅ Long context (200K vs 128K)
- ✅ Best-in-class function calling
Hybrid Strategy:
# 70% DeepSeek, 30% Claude
# Average cost: ~$1.08/M input, ~$5.30/M output
# Savings: 64% vs pure ClaudeProduction Deployment
Docker Compose
services:
litellm-deepseek:
image: ghcr.io/berriai/litellm:main-latest
restart: unless-stopped
ports:
- "4000:4000"
environment:
DEEPSEEK_API_KEY: ${DEEPSEEK_API_KEY}
volumes:
- ./litellm_config.yaml:/app/config.yaml
command: ["--config", "/app/config.yaml"]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
interval: 30s
timeout: 10s
retries: 3Monitoring
# View usage/costs
curl http://localhost:4000/spend/tags
# Response:
{
"deepseek-chat": {
"total_cost": 0.685,
"total_tokens": 1000000
}
}Resources
- DeepSeek Platform: https://platform.deepseek.com
- Documentation: https://api-docs.deepseek.com
- Pricing: https://platform.deepseek.com/api-docs/pricing
- LiteLLM Docs: https://docs.litellm.ai
- Model Card: https://huggingface.co/deepseek-ai/DeepSeek-V3
Next Steps
- Try OpenAI models for better instruction following
- Configure GLM-4.7 for even cheaper alternative
- Run local models for complete privacy
- Learn cost optimization