Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

Google Gemini 2.0 Setup

Use Google Gemini 2.0 with agentful for 1M context and multimodal capabilities via LiteLLM proxy.

Why Gemini 2.0?

FeatureGemini 2.0 FlashClaude Sonnet 4.5
Input Cost$0.075/M$3.00/M
Output Cost$0.30/M$15.00/M
Context1M tokens200K tokens
Thinking Mode
Vision✅ Advanced
Audio✅ Native
Video✅ Native
Best For:
  • Analyzing entire codebases (1M context!)
  • Multi-modal tasks (image/video/audio)
  • Cost-sensitive projects (97% cheaper)
  • Long documents/conversations

Quick Start (5 minutes)

# Install LiteLLM
pip install 'litellm[proxy]'
 
# Get API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY=AIzaSy...
 
# Start proxy
litellm --model gemini/gemini-2.0-flash-exp --drop_params --port 4000
 
# Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$GEMINI_API_KEY
 
claude
/agentful-start

Detailed Setup

Step 1: Get Gemini API Key

  1. Visit aistudio.google.com/apikey
  2. Create new API key
  3. Copy key (starts with AIzaSy...)

Pricing (as of Jan 2025):

  • Gemini 2.0 Flash: $0.075/M input, $0.30/M output
  • Gemini 1.5 Pro: $1.25/M input, $5.00/M output
  • Free tier: 1500 requests/day

Step 2: Install LiteLLM

pip install 'litellm[proxy]'
 
# Or with Docker
docker run -d \
  --name litellm-gemini \
  -p 4000:4000 \
  -e GEMINI_API_KEY=$GEMINI_API_KEY \
  ghcr.io/berriai/litellm:main-latest \
  --model gemini/gemini-2.0-flash-exp \
  --drop_params

Step 3: Configure LiteLLM

Create litellm_config.yaml:

model_list:
  - model_name: gemini-2-flash
    litellm_params:
      model: gemini/gemini-2.0-flash-exp
      api_key: os.environ/GEMINI_API_KEY
      drop_params: true
 
  - model_name: gemini-1.5-pro
    litellm_params:
      model: gemini/gemini-1.5-pro
      api_key: os.environ/GEMINI_API_KEY
      drop_params: true
 
litellm_settings:
  drop_params: true
  max_tokens: 8192  # Gemini default output limit

Start:

litellm --config litellm_config.yaml --port 4000

Step 4: Configure Claude Code

Persistent (~/.claude/settings.json):

{
  "environmentVariables": {
    "ANTHROPIC_BASE_URL": "http://localhost:4000",
    "ANTHROPIC_API_KEY": "your_gemini_api_key",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "gemini-2-flash",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "gemini-2-flash",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "gemini-1.5-pro"
  }
}
Session-based:
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=$GEMINI_API_KEY
claude

Model Variants

Gemini 2.0 Flash (Recommended)

Latest fast model
  • Context: 1M tokens
  • Output: 8K tokens
  • Strengths: Speed, multimodal, cost
  • Cost: $0.075/M input, $0.30/M output
  • Use for: Most tasks

Gemini 1.5 Pro

Previous flagship
  • Context: 2M tokens (experimental)
  • Output: 8K tokens
  • Strengths: Reasoning, long context
  • Cost: $1.25/M input, $5.00/M output
  • Use for: Complex analysis

Advanced Features

1M Context Window

Analyze entire codebases:

# Example: Analyze 500K token codebase
claude
 
You: "Analyze this entire codebase for security vulnerabilities"
 
# Gemini can process ALL files at once
# Claude would need chunking or summarization
Use cases:
  • Full codebase analysis
  • Long conversations (100+ messages)
  • Large documentation sets
  • Multi-file refactoring

Multimodal Capabilities

Vision (images):
# Gemini can analyze images in prompts
You: "Review this UI mockup and generate React components"
[Attach screenshot]
 
# Works with diagrams, screenshots, charts
Audio (Gemini 2.0 only):
# Future: Audio transcription + analysis
# Currently limited via Claude Code integration
Video (Gemini 2.0 only):
# Future: Video understanding
# Currently limited via Claude Code integration

Thinking Mode

Enable for complex reasoning:

litellm_params:
  model: gemini/gemini-2.0-flash-thinking-exp

Note: Thinking mode uses more tokens but improves quality.


Integration with agentful

All Agents Work

/agentful-start  # Orchestrator
/agentful-product  # Architect
# All agents use Gemini automatically

Recommended Configuration

AgentModelWhy
Orchestratorgemini-2-flashFast planning
Architectgemini-1.5-proDeep analysis
Backendgemini-2-flashCode generation
Frontendgemini-2-flashMultimodal (can see designs)
Testergemini-2-flashFast tests
Reviewergemini-2-flashCode review
Fixergemini-2-flashQuick fixes

Cost Comparison

1M Token Workflow

Claude Sonnet 4.5:
Input: 500K × $3/M = $1.50
Output: 500K × $15/M = $7.50
Total: $9.00
 
Gemini 2.0 Flash:
Input: 500K × $0.075/M = $0.0375
Output: 500K × $0.30/M = $0.15
Total: $0.1875
 
Savings: $8.81 (98%)

Large Codebase Analysis

Scenario: Analyze 800K token codebase
 
Claude Sonnet 4.5:
- Requires chunking (context limit: 200K)
- 4 requests × $0.60 = $2.40
- Quality loss from chunking
 
Gemini 2.0 Flash:
- Single request: 800K × $0.075/M = $0.06
- No chunking needed
- Better context understanding
 
Savings: 97% + better quality

Performance Comparison

Code Generation (SWE-bench)

Claude Sonnet 4.5:  77.2%
Gemini 2.0 Flash:   ~65%

Trade-off: -12% quality, but 97% cheaper

Long Context (RULER)

Gemini 1.5 Pro:     98.7% (1M tokens)
Claude Sonnet 4.5:  94.2% (200K tokens)
Gemini wins on long context!

Multimodal Understanding

Gemini 2.0:         Best-in-class
Claude Sonnet 4.5:  Good
GPT-4o:             Very good

Troubleshooting

Rate Limits

# Free tier: 1500 requests/day
# Paid tier: 360 requests/minute
 
# Check quota
curl "https://generativelanguage.googleapis.com/v1/models?key=$GEMINI_API_KEY"
 
# Upgrade if needed
# https://console.cloud.google.com/billing

Quality Issues

# If quality is lower than expected:
 
# 1. Switch to Gemini 1.5 Pro
export ANTHROPIC_DEFAULT_SONNET_MODEL=gemini-1.5-pro
 
# 2. Enable thinking mode
model: gemini/gemini-2.0-flash-thinking-exp
 
# 3. Use Claude for critical tasks
export ANTHROPIC_BASE_URL=https://api.anthropic.com

Context Length Errors

# Gemini limits:
# - Flash: 1M input, 8K output
# - Pro: 2M input (experimental), 8K output
 
# If hitting limits, chunk responses:
litellm_settings:
  max_tokens: 4096  # Reduce output length

When to Use Gemini

Use Gemini for:

  • ✅ Large codebase analysis (1M context)
  • ✅ Cost optimization (97% cheaper)
  • ✅ Multimodal tasks (images, video, audio)
  • ✅ Long conversations
  • ✅ High-volume operations

Use Claude for:

  • ✅ Complex instruction following
  • ✅ Production-critical code
  • ✅ Best-in-class coding quality
  • ✅ Superior function calling
  • ✅ Thinking/reasoning tasks

Hybrid Strategy:

# Gemini for analysis, Claude for implementation
# 60% Gemini, 40% Claude
# Average savings: ~58%

Production Deployment

Docker Compose

services:
  litellm-gemini:
    image: ghcr.io/berriai/litellm:main-latest
    restart: unless-stopped
    ports:
      - "4000:4000"
    environment:
      GEMINI_API_KEY: ${GEMINI_API_KEY}
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    command: ["--config", "/app/config.yaml"]

Resources


Next Steps