GLM-4.7 Setup
Use GLM-4.7 with agentful for 10x cost savings while maintaining competitive quality.
Why GLM-4.7?
| Metric | GLM-4.7 | Claude Sonnet 4.5 | Difference |
|---|---|---|---|
| Input Cost | $0.60/M | $3.00/M | 83% cheaper |
| Output Cost | $2.20/M | $15.00/M | 85% cheaper |
| SWE-bench | 73.8% | 77.2% | -4.4% |
| Math Reasoning | 98.6 | 87.0 | +13.3% |
| Tool Invocation | 84.7 | Lower | Better |
| Context Window | 200K | 200K | Same |
| Output Capacity | 128K | 8K | 16x larger |
Best For:
- Cost-sensitive projects
- Mathematical/algorithmic tasks
- Tool-heavy workflows
- Bulk operations
- Large output requirements
Quick Start (2 minutes)
Option 1: Using GLM CLI Tool (Easiest)
# Install GLM CLI
npm install -g @xqsit94/glm
# Launch Claude Code with GLM
glm --model glm-4.7
# That's it! Now run agentful commands:
/agentful-startOption 2: Environment Variables
# Get API key from https://z.ai
export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
export ANTHROPIC_AUTH_TOKEN=your_zai_api_key
# Launch Claude Code
claude
# Run agentful
/agentful-startOption 3: Automatic Setup Script
# Download and run setup script
curl -O "https://cdn.bigmodel.cn/install/claude_code_zai_env.sh"
bash ./claude_code_zai_env.sh
# Follow prompts to configure
claudeDetailed Setup
Step 1: Get API Key
- Visit Z.AI (international) or BigModel (China)
- Sign up for free account
- Navigate to API Keys section
- Generate new API key
- Copy the key (starts with
zai_...or similar)
Pricing:
- Free Tier: Limited requests for testing
- GLM Coding Plan Lite: $3/month (~2400 prompts)
- GLM Coding Plan Max: ~$10-20/month (production usage)
- Pay-as-you-go: $0.60/M input, $2.20/M output
Step 2: Configure Claude Code
Persistent Configuration (Recommended)
Edit ~/.claude/settings.json:
{
"environmentVariables": {
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-4.7"
}
}Session-Based Configuration
# Set for current session only
export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
export ANTHROPIC_AUTH_TOKEN=your_zai_api_key
# Launch Claude Code
claudePer-Project Configuration
Add to your project's .env:
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
ANTHROPIC_AUTH_TOKEN=your_zai_api_keyStep 3: Verify Setup
# Start Claude Code
claude
# Check status
/status
# Should show:
# Model: glm-4.7 (via Z.AI)
# Status: ConnectedTest with a simple prompt:
You: What model are you?
Assistant: I am GLM-4.7, running via Z.AI API.GLM Model Variants
glm-4.7 (Recommended)
Latest and most capable- Released: December 2025
- Context: 200K tokens
- Output: 128K tokens
- Strengths: Math reasoning, tool use, multilingual
- SWE-bench: 73.8%
- Cost: $0.60/M input, $2.20/M output
# Use in settings.json
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7"glm-4.6
Previous generation- Context: 200K tokens
- Strengths: Still very capable, slightly cheaper
- SWE-bench: 68.0%
- Use Case: Fallback option
glm-4.5-air
Lightweight variant- Context: 128K tokens
- Speed: Faster responses
- Cost: Lower than 4.7
- Use Case: Simple tasks, prototyping
# Good for Haiku-level tasks
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air"glm-4-long
Ultra-long context- Context: Ultra-long (exact limit varies)
- Cost: $0.14/M tokens (both input/output)
- Use Case: Analyzing massive codebases
Advanced Features
Preserved Thinking Mode
GLM-4.7 can maintain reasoning state across conversation turns:
# Enable via API (future agentful feature)
{
"clear_thinking": false, # Preserve reasoning
"enable_thinking": true # Enable interleaved thinking
}Benefits:
- 42.8% on HLE benchmark (+12.4% improvement)
- Consistent architectural decisions
- Reduced re-computation
- Better for multi-session projects
Large Output Generation
GLM-4.7 can generate 128K tokens in one response (Claude: 8K):
# Use case: Generate entire codebase
You: "Create a full e-commerce application with frontend, backend, and database"
GLM-4.7: [Generates 50+ files in single response]
- Complete React frontend (20K tokens)
- Full Express backend (15K tokens)
- Database schemas (5K tokens)
- Docker configuration (3K tokens)
- Documentation (10K tokens)Context Caching
Reduce costs for repeated context:
# Cached tokens: $0.11/M (vs $0.60/M full price)
# 20-40% cost reduction
# Example: Analyzing same codebase multiple times
# First analysis: $0.60/M tokens
# Subsequent: $0.11/M tokens (cached)Multilingual Support
GLM-4.7 excels at Chinese/English bilingual coding:
# SWE-bench Multilingual: 66.7% (+12.9% over GLM-4.6)
You: "创建一个电商网站" (Create an e-commerce site)
GLM-4.7: [Generates code with Chinese comments and English code]Integration with agentful
All Agents Work Seamlessly
Once GLM is configured, all agentful agents work identically:
# Orchestrator
/agentful-start
# Architect
/agentful-product
# Backend/Frontend/Tester/Reviewer/Fixer
# All use GLM-4.7 automaticallyRecommended Agent Configuration
GLM-4.7 is particularly strong for:
Architect Agent:
- Superior at system design
- Excellent pattern recognition
- Cost-effective for large analysis
Backend Agent:
- Strong algorithmic performance
- Database optimization
- API design
Tester Agent:
- Test generation
- Edge case identification
- 128K output = comprehensive test suites
Orchestrator Agent:
- Tool orchestration (84.7 τ²-Bench)
- Multi-step workflows
- Long-running sessions
Cost Optimization Strategies
1. Hybrid Approach
Use GLM for most tasks, Claude for critical code:
# 70% GLM, 30% Claude
# Cost: $1.32/M tokens (56% savings)
# 90% GLM, 10% Claude
# Cost: $0.84/M tokens (72% savings)When to use GLM:
- Bulk file analysis
- Algorithm implementation
- Test generation
- Documentation
- Refactoring
- Tool-heavy workflows
When to use Claude:
- Production-critical endpoints
- Security-sensitive code
- Final code review
- User-facing features
2. Model Variant Routing
# Use glm-4.5-air for simple tasks
export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
# Use glm-4.7 for complex tasks
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"3. Context Caching
# For repeated codebase analysis
# First run: Full cost
# Subsequent: 81% cheaper (cached context)Performance Tips
Math-Heavy Tasks
GLM-4.7 outperforms Claude on mathematical reasoning:
# Math benchmark: 98.6 vs Claude's 87.0
You: "Implement optimal algorithm for longest increasing subsequence"
GLM-4.7:
- Correctly identifies O(n log n) solution
- Provides mathematical proof
- Explains complexity analysis
- Generates test cases
- Optimized implementationTool-Heavy Workflows
GLM-4.7 excels at multi-tool orchestration:
# τ²-Bench: 84.7 (beats Claude)
You: "Research competitors, analyze market, create presentation"
GLM-4.7:
[Tool: web_search] → "AI coding assistants 2025"
[Tool: web_search] → "Market size AI development tools"
[Tool: analyze_data] → Process search results
[Tool: generate_slides] → Create presentation
[Tool: image_search] → Find charts/diagrams
[Success Rate: 85%]Large Codebase Generation
Use GLM's 128K output capacity:
You: "Generate complete microservices architecture for e-commerce"
GLM-4.7: [Generates in ONE response]
- API Gateway (10K tokens)
- Auth Service (15K tokens)
- Product Service (15K tokens)
- Order Service (15K tokens)
- Payment Service (10K tokens)
- Docker Compose (5K tokens)
- Kubernetes configs (10K tokens)
- Documentation (20K tokens)
Total: 100K tokens in single response
(Claude would need 12+ separate responses)Troubleshooting
Connection Issues
# Test API connectivity
curl https://api.z.ai/v1/models \
-H "Authorization: Bearer $ANTHROPIC_AUTH_TOKEN"
# Should return list of available modelsAuthentication Errors
# Check environment variables
echo $ANTHROPIC_BASE_URL
echo $ANTHROPIC_AUTH_TOKEN
# Should output:
# https://api.z.ai/api/anthropic
# zai_...your_key...Quality Issues
If responses are lower quality than expected:
- Check model variant: Ensure using glm-4.7 (not 4.5-air)
- Enable thinking mode: Better for complex tasks
- Adjust temperature: Lower (0.2) for code, higher (0.7) for creative
- Provide more context: GLM benefits from detailed prompts
Rate Limiting
# GLM rate limits (typical):
# Free tier: ~100 requests/day
# Lite plan: ~2400 prompts/5 hours
# Max plan: ~Production usage
# If rate limited:
# 1. Upgrade plan
# 2. Use caching to reduce requests
# 3. Add retry logic with backoffLong Context Degradation
GLM-4.7 occasionally degrades beyond 150K tokens:
# Solution 1: Explicit guidance
You: "Focus on authentication modules in files 50-75"
# Solution 2: Chunk large contexts
# Break 200K context into 2x 100K requests
# Solution 3: Use glm-4-long variant
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-4-long"Switching Back to Claude
Temporary Switch
# Unset GLM configuration
unset ANTHROPIC_BASE_URL
unset ANTHROPIC_AUTH_TOKEN
# Set Claude key
export ANTHROPIC_API_KEY=your_claude_key
# Launch Claude Code
claudePermanent Switch
Edit ~/.claude/settings.json:
{
"environmentVariables": {
// Remove or comment out GLM config
// "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"ANTHROPIC_API_KEY": "your_claude_key"
}
}Per-Task Switching
# Use GLM for bulk analysis
glm --model glm-4.7
/analyze-codebase
# Exit and use Claude for production code
claude
/generate-api-endpointCommunity Tools
GLM CLI Manager
# Install
npm install -g @xqsit94/glm
# Features
glm --model glm-4.7 # Launch with specific model
glm token set YOUR_KEY # Manage API keys
glm token show # View current key
glm --help # See all optionsClaude Code GLM Switcher
# Install
npm install -g claude-glm-switcher
# Quick launch
claude-glm # GLM-4.7
claude-glm-air # GLM-4.5-Air
claude # Back to ClaudeMCP Servers
GLM Math Co-Processor (robertcprice/glm-mcp-server):
{
"mcpServers": {
"glm-math": {
"command": "node",
"args": ["glm-mcp-server/index.js"],
"env": { "ZHIPU_API_KEY": "your_key" }
}
}
}Use Claude for orchestration, GLM for math-heavy operations.
Resources
- Z.AI Docs: https://docs.z.ai/
- GLM-4.7 Overview: https://docs.z.ai/guides/llm/glm-4.7
- Function Calling: https://docs.z.ai/guides/capabilities/function-calling
- Pricing: https://docs.z.ai/guides/overview/pricing
- Hugging Face: https://huggingface.co/zai-org/GLM-4.7
- Community Discord: https://discord.gg/agentful
Next Steps
- Set up DeepSeek for thinking mode workflows
- Configure Gemini for 1M context
- Try local models for privacy
- Learn cost optimization