Google Gemini 2.0 Setup
Use Google Gemini 2.0 with agentful for 1M context and multimodal capabilities via LiteLLM proxy.
Why Gemini 2.0?
| Feature | Gemini 2.0 Flash | Claude Sonnet 4.5 |
|---|---|---|
| Input Cost | $0.075/M | $3.00/M |
| Output Cost | $0.30/M | $15.00/M |
| Context | 1M tokens | 200K tokens |
| Thinking Mode | ✅ | ✅ |
| Vision | ✅ Advanced | ✅ |
| Audio | ✅ Native | ❌ |
| Video | ✅ Native | ❌ |
- Analyzing entire codebases (1M context!)
- Multi-modal tasks (image/video/audio)
- Cost-sensitive projects (97% cheaper)
- Long documents/conversations
Quick Start (5 minutes)
# Install LiteLLM
pip install 'litellm[proxy]'
# Get API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY=AIzaSy...
# Start proxy
litellm --model gemini/gemini-2.0-flash-exp --drop_params --port 4000
# Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$GEMINI_API_KEY
claude
/agentful-startDetailed Setup
Step 1: Get Gemini API Key
- Visit aistudio.google.com/apikey
- Create new API key
- Copy key (starts with
AIzaSy...)
Pricing (as of Jan 2025):
- Gemini 2.0 Flash: $0.075/M input, $0.30/M output
- Gemini 1.5 Pro: $1.25/M input, $5.00/M output
- Free tier: 1500 requests/day
Step 2: Install LiteLLM
pip install 'litellm[proxy]'
# Or with Docker
docker run -d \
--name litellm-gemini \
-p 4000:4000 \
-e GEMINI_API_KEY=$GEMINI_API_KEY \
ghcr.io/berriai/litellm:main-latest \
--model gemini/gemini-2.0-flash-exp \
--drop_paramsStep 3: Configure LiteLLM
Create litellm_config.yaml:
model_list:
- model_name: gemini-2-flash
litellm_params:
model: gemini/gemini-2.0-flash-exp
api_key: os.environ/GEMINI_API_KEY
drop_params: true
- model_name: gemini-1.5-pro
litellm_params:
model: gemini/gemini-1.5-pro
api_key: os.environ/GEMINI_API_KEY
drop_params: true
litellm_settings:
drop_params: true
max_tokens: 8192 # Gemini default output limitStart:
litellm --config litellm_config.yaml --port 4000Step 4: Configure Claude Code
Persistent (~/.claude/settings.json):
{
"environmentVariables": {
"ANTHROPIC_BASE_URL": "http://localhost:4000",
"ANTHROPIC_API_KEY": "your_gemini_api_key",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "gemini-2-flash",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "gemini-2-flash",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "gemini-1.5-pro"
}
}export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$GEMINI_API_KEY
claudeModel Variants
Gemini 2.0 Flash (Recommended)
Latest fast model- Context: 1M tokens
- Output: 8K tokens
- Strengths: Speed, multimodal, cost
- Cost: $0.075/M input, $0.30/M output
- Use for: Most tasks
Gemini 1.5 Pro
Previous flagship- Context: 2M tokens (experimental)
- Output: 8K tokens
- Strengths: Reasoning, long context
- Cost: $1.25/M input, $5.00/M output
- Use for: Complex analysis
Advanced Features
1M Context Window
Analyze entire codebases:
# Example: Analyze 500K token codebase
claude
You: "Analyze this entire codebase for security vulnerabilities"
# Gemini can process ALL files at once
# Claude would need chunking or summarization- Full codebase analysis
- Long conversations (100+ messages)
- Large documentation sets
- Multi-file refactoring
Multimodal Capabilities
Vision (images):# Gemini can analyze images in prompts
You: "Review this UI mockup and generate React components"
[Attach screenshot]
# Works with diagrams, screenshots, charts# Future: Audio transcription + analysis
# Currently limited via Claude Code integration# Future: Video understanding
# Currently limited via Claude Code integrationThinking Mode
Enable for complex reasoning:
litellm_params:
model: gemini/gemini-2.0-flash-thinking-expNote: Thinking mode uses more tokens but improves quality.
Integration with agentful
All Agents Work
/agentful-start # Orchestrator
/agentful-product # Architect
# All agents use Gemini automaticallyRecommended Configuration
| Agent | Model | Why |
|---|---|---|
| Orchestrator | gemini-2-flash | Fast planning |
| Architect | gemini-1.5-pro | Deep analysis |
| Backend | gemini-2-flash | Code generation |
| Frontend | gemini-2-flash | Multimodal (can see designs) |
| Tester | gemini-2-flash | Fast tests |
| Reviewer | gemini-2-flash | Code review |
| Fixer | gemini-2-flash | Quick fixes |
Cost Comparison
1M Token Workflow
Claude Sonnet 4.5:
Input: 500K × $3/M = $1.50
Output: 500K × $15/M = $7.50
Total: $9.00
Gemini 2.0 Flash:
Input: 500K × $0.075/M = $0.0375
Output: 500K × $0.30/M = $0.15
Total: $0.1875
Savings: $8.81 (98%)Large Codebase Analysis
Scenario: Analyze 800K token codebase
Claude Sonnet 4.5:
- Requires chunking (context limit: 200K)
- 4 requests × $0.60 = $2.40
- Quality loss from chunking
Gemini 2.0 Flash:
- Single request: 800K × $0.075/M = $0.06
- No chunking needed
- Better context understanding
Savings: 97% + better qualityPerformance Comparison
Code Generation (SWE-bench)
Claude Sonnet 4.5: 77.2%
Gemini 2.0 Flash: ~65%Trade-off: -12% quality, but 97% cheaper
Long Context (RULER)
Gemini 1.5 Pro: 98.7% (1M tokens)
Claude Sonnet 4.5: 94.2% (200K tokens)Multimodal Understanding
Gemini 2.0: Best-in-class
Claude Sonnet 4.5: Good
GPT-4o: Very goodTroubleshooting
Rate Limits
# Free tier: 1500 requests/day
# Paid tier: 360 requests/minute
# Check quota
curl "https://generativelanguage.googleapis.com/v1/models?key=$GEMINI_API_KEY"
# Upgrade if needed
# https://console.cloud.google.com/billingQuality Issues
# If quality is lower than expected:
# 1. Switch to Gemini 1.5 Pro
export ANTHROPIC_DEFAULT_SONNET_MODEL=gemini-1.5-pro
# 2. Enable thinking mode
model: gemini/gemini-2.0-flash-thinking-exp
# 3. Use Claude for critical tasks
export ANTHROPIC_BASE_URL=https://api.anthropic.comContext Length Errors
# Gemini limits:
# - Flash: 1M input, 8K output
# - Pro: 2M input (experimental), 8K output
# If hitting limits, chunk responses:
litellm_settings:
max_tokens: 4096 # Reduce output lengthWhen to Use Gemini
Use Gemini for:
- ✅ Large codebase analysis (1M context)
- ✅ Cost optimization (97% cheaper)
- ✅ Multimodal tasks (images, video, audio)
- ✅ Long conversations
- ✅ High-volume operations
Use Claude for:
- ✅ Complex instruction following
- ✅ Production-critical code
- ✅ Best-in-class coding quality
- ✅ Superior function calling
- ✅ Thinking/reasoning tasks
Hybrid Strategy:
# Gemini for analysis, Claude for implementation
# 60% Gemini, 40% Claude
# Average savings: ~58%Production Deployment
Docker Compose
services:
litellm-gemini:
image: ghcr.io/berriai/litellm:main-latest
restart: unless-stopped
ports:
- "4000:4000"
environment:
GEMINI_API_KEY: ${GEMINI_API_KEY}
volumes:
- ./litellm_config.yaml:/app/config.yaml
command: ["--config", "/app/config.yaml"]Resources
- Gemini API: https://ai.google.dev
- Get API Key: https://aistudio.google.com/apikey
- Pricing: https://ai.google.dev/pricing
- LiteLLM Docs: https://docs.litellm.ai
Next Steps
- Try DeepSeek for 93% cost savings
- Configure GLM-4.7 for 90% cost savings
- Run local models for complete privacy
- Learn cost optimization